tailieunhanh - Mining Console Logs for Large-Scale System Problem Detection

Given the well-estimated usual event model and an un- seen test sequence, we first slice the test sequence into fixed length segments with overlapping. This is done by mov- ing a sliding window. The choice of the sliding window size corresponds to the minimum duration constraint in the HMM framework. Given the usual event model, the likeli- hood of each segment is then calculated. The segment with the lowest likelihood value is identified as an outlier (Figure 2, step 1). The outlier is expected to represent one specific unusual event and could be used to train an unusual event model. However, one single outlier is obviously insufficient to give a good. | Mining Console Logs for Large-Scale System Problem Detection Wei Xu Ling Huangt Armando Fox David Patterson Michael Jordan UC Berkeley Intel Research Berkeley Abstract The console logs generated by an application contain messages that the application developers believed would be useful in debugging or monitoring the application. Despite the ubiquity and large size of these logs they are rarely exploited in a systematic way for monitoring and debugging because they are not readily machine-parsable. In this paper we propose a novel method for mining this rich source of information. First we combine log parsing and text mining with source code analysis to extract structure from the console logs. Second we extract features from the structured information in order to detect anomalous patterns in the logs using Principal Component Analysis PCA . Finally we use a decision tree to distill the results of PCA-based anomaly detection to a format readily understandable by domain experts . system operators who need not be familiar with the anomaly detection algorithms. As a case study we distill over one million lines of console logs from the Hadoop file system to a simple decision tree that a domain expert can readily understand the process requires no operator intervention and we detect a large portion of runtime anomalies that are commonly overlooked. 1 Introduction Today s large-scale Internet services inn in large server clusters. A recent trend is to run these services on virtualized cloud computing environments such as Amazon s Elastic Compute Cloud EC2 2 . The scale and complexity of these services makes it very difficult to design deploy and maintain a monitoring system. In this paper we propose to return to console logs the natural tracing information included in almost every software system for monitoring and problem detection. Since the earliest days of software developers have used free-text console logs to report internal states trace program execution and .