Log Differencing using State Machines for Anomaly Detection

More Info
expand_more

Abstract

Huge amounts of log data are generated every day by software. These data contain valuable information about the behavior and the health of the system, which is rarely exploited, because of their volume and unstructured nature. Manually going through log files is a time-consuming and labor-intensive procedure for developers. Nonetheless logging information can expose the problematic execution of the software, even though the final outcome seem to be normal. Nowadays the automatic analysis of the log files is crucial for detecting problems, but mainly for understanding how the software behaves, which would be beneficial for the prevention of failures and improvement of the software itself. Towards that direction, this project aims the identifications of unexpected executions of the software and the determination of the root cause behind them. In more details, the expected behavior of the software can be approximated using model inference techniques and the newly incoming observed data can be analyzed to verify if they are conformed by the expected behavior. The conformance checking method that will be used is called replay. The incoming traces will be replayed in the graph, at the point they are not validated, the alignment algorithm will take over. The sequence alignment is performed in three different ways. Two of the methods are looking for the best alignment at a specific radius around the problematic node. Additionally a global alignment technique is implemented, which is based on the famous algorithm by Needleman and Wunsch for DNA sequences. Our goal required the modification of the aforementioned algorithm to not only align two sequences, but a sequence with a tree structured model. Finally the implemented tool provides the visualization of the differences in a way that makes it intuitive for the developers to understand what went wrong. Some additional information are also provided to make the investigation of the "anomaly" easier.