Real-time anomaly detection in logs using rule mining and complex event processing at scale

More Info
expand_more

Abstract

Log data, produced from every computer system and program, are widely used as source of valuable information to monitor and understand their behavior and their health. However, as large-scale systems generate a massive amount of log data every minute, it is impossible to detect the cause of system failure by examining manually this huge size of data. Thus, there is a need for an automated tool for finding system's failure with little or none human effort. Nowadays lots of methods exist that try to detect anomalies on system's logs by analyzing and applying various algorithms such as machine learning algorithms. However, experts argue that a system error can not be found by looking into a single event, but in multiple log event data are necessary to understand the root cause of a problem. In this thesis work, we aim to detect patterns in sequential distributed system's logs that can capture effectively the abnormal behavior. Specifically as a first step, we will apply rule mining techniques to extract rules that represent an anomalous behavior, which potentially in the future may lead to a failure of a system. Except for that step, we implemented a real-time anomaly detection framework to detect problems before they actually occur.
Processing log data as streams is the only way to achieve a real-time detection concept. In that direction we will process streaming log data using a complex event processing technique. Specifically, we would like to combine rule mining algorithms with complex event processing engine to raise alerts on abnormal log data based on automatically generated patterns. The evaluation of the work is conducted on Hadoop's logs, a widely used system in the industry. The outcome of this thesis project gives really promising results, reaching a Recall of 98\% in detecting anomalies. Finally, a scalable anomaly detection framework was build by integrating different systems into the cloud. The motivation behind this is the direct application of our framework to a real-life use case.