Monitoring Release Logs at Adyen

Feature Extraction and Anomaly Detection

More Info
expand_more

Abstract

Monitoring the release logs of modern online software is a challenging topic because of the enormous amount of release logs and the complicated release process. The goal of this thesis is to develop a pipeline that can monitor the release logs and find anomalous logs, automating this step with anomaly detection and reducing the required manual effort. We improve the pipeline from the recent work of Microsoft, enabling it to monitor logs with different severity levels and extremely long sequences.

We first use IPLoM and its reconciling step for raw logs to obtain log events and then use log event sets, a simplified version of log sequences, for anomaly detection. The outlier scores of log event sets are calculated using anomaly detection algorithms, and those with an outlier score higher than the threshold are clustered to reduce the number of output. In the final output result, we propose two ranking functions to sort the potential anomalous clusters and only show the top 10 results. Another complementary step beside anomaly detection is designed to capture recurrent anomalies in known clusters that have seen before. By finding the optimal parameters for hierarchical clustering, nearest neighbor distance, and LOF, we test the performance of pipeline on Adyen log data and make our suggestions. Finally, we also test the robustness of the pipeline with two types of artificial data sets.