SLAML '10 Workshop Session Abstracts

WORKSHOP PROGRAM ABSTRACTS

Sunday, October 3, 2010

9:45 a.m.–10:15 a.m.

Creating the Knowledge about IT Events
Back to Program
We describe a system which automatically associates documents containing knowledge about problems and their resolutions to IT events. IT professionals use these when troubleshooting problems in the IT environment. The system associates documents from several independent sources, both internal and external to the organization, using machine learning techniques to associate the most relevant and pertinent information.

10:30 a.m.–Noon

Synoptic: Summarizing System Logs with Refinement
Back to Program
Distributed systems are often difficult to debug and understand. A typical way of gaining insight into system behavior is by inspecting execution logs. However, manual inspection of logs is an arduous process. To support this task we developed Synoptic. Synoptic outputs a concise graph representation of logged events that captures temporal invariants mined from the log. We applied Synoptic to synthetic and real distributed system logs and found that it augmented a distributed system designer's understanding of system behavior with reasonable overhead for an offline analysis tool. In contrast to prior approaches, Synoptic uses a combination of refinement and coarsening to explore the space of representations. Additionally, it infers temporal event invariants to capture distributed system semantics. These invariants drive the exploration process and are satisfied by the final representation.

A Graphical Representation for Identifier Structure in Logs
Back to Program
Application console logs are a ubiquitous tool for diagnosing system failures and anomalies. While several techniques exist to interpret logs, describing and assessing log quality remains relatively unexplored. In this paper, we describe an abstract graphical representation of console logs called the identifier graph and a visualization based on this representation. Our representation breaks logs into message types and identifier fields and shows the interrelation between the two. We describe two applications of this visualization. We apply it to Hadoop logs from two different deployments, showing that we capture important properties of Hadoop's logging as well as relevant differences between the two sites. We also apply our technique to logs from two other systems under development. We show that our representation helps highlight flaws in the underlying application logging.

SIP CLF: A Common Log Format (CLF) for the Session Initiation Protocol (SIP)
Back to Program
Web servers such as Apache and web proxies like Squid support event logging using a common log format. The logs produced using these de-facto standard formats are invaluable to system administrators for trouble-shooting a server and tool writers to craft tools that mine the log files and produce reports and trends. The Session Initiation Protocol (SIP) does not have a common log format, and as a result, each server supports a distinct log format. This plethora of formats discourages the creation of common tools. Whilst SIP is similar to HTTP, there are a number of fundamental differences between a session-mode protocol and a stateless request-response protocol. We propose a common log file format for SIP servers that can be used uniformly by proxies, registrars, redirect servers as well as back-to-back user agents. Such a canonical file can be used to train anomaly detection systems and feed events into a security event management system.

1:30 p.m.–2:30 p.m.

Experience Mining Google's Production Console Logs
Back to Program
We describe our early experience in applying our console log mining techniques [19, 20] to logs from production Google systems with thousands of nodes. This data set is five orders of magnitude in size and contains almost 20 times as many messages types as the Hadoop data set we used in [19]. It also has many properties that are unique to large scale production deployments (e.g., the system stays on for several months and multiple versions of the software can run concurrently). Our early experience shows that our techniques, including source code based log parsing, state and sequence based feature creation and problem detection, work well on this production data set. We also discuss our experience in using our log parser to assist the log sanitization.

Analyzing Web Logs to Detect User-Visible Failures
Back to Program
Web applications suffer from poor reliability. Practitioners commonly rely on fast failure detection to recover their applications quickly to reduce the effects of the failures on other users. In this paper, we present a technique for detecting user-visible failures by analyzing Web logs. Our technique applies a first-order Markov model to infer anomalous browsing behavior discovered in Web logs as indicators that users have encountered failures. We implemented our technique in a tool called REBA (REcursive Byesian Analysis of Web Logs). We evaluated our technique using REBA applied to the Web site of NASA. The results demonstrate that our technique can detect user-visible failures with reasonable cost.

2:45 p.m.–3:30 p.m.

Bridging the Gaps: Joining Information Sources with Splunk
Back to Program
Supercomputers are composed of many diverse components, operated at a variety of scales, and function as a coherent whole. The resulting logs are thus diverse in format, interrelated at multiple scales, and provide evidence of faults across subsystems. When combined with system configuration information, insights on both the downstream effects and upstream causes of events can be determined. However, difficulties in joining the data and expressing complex queries slow the speed at which actionable insights can be obtained. Effectively connecting data experts and data miners faces similar hurdles. This paper describes our experience with applying the Splunk log analysis tool as a vehicle to combine both data, and people. Splunk's search language, lookups, macros, and subsearches reduce hours of tedium to seconds of simplicity, and its tags, saved searches, and dashboards offer both operational insights and collaborative vehicles.

Optimizing Data Analysis with a Semi-structured Time Series Database
Back to Program
Most modern systems generate abundant and diverse log data. With dwindling storage costs, there are fewer reasons to summarize or discard data. However, the lack of tools to efficiently store and cross-correlate heterogeneous datasets makes it tedious to mine the data for analytic insights. In this paper, we present Splunk, a semi-structured time series database that can be used to index, search and analyze massive heterogeneous datasets. We share observations, lessons and case studies from real world datasets, and demonstrate Splunk's power and flexibility for enabling insightful data mining searches.

Need help? Use our Contacts page.

Back to Program
Last changed: 18 Sept. 2010 jel