Logs to the Rescue

Creating meaningful representations from log files for Anomaly Detection

More Info
expand_more

Abstract

This thesis offers a comprehensive exploration of log-based anomaly detection within the domain of cybersecurity incident response. The research describes a different approach and explores relevant log features for language model training, experimentation with different language models and training methodologies, and the investigation of the potential contribution of extra contextual features. The newly proposed approach is compared against an already implemented baseline in a finite-state classifier called FlexFringe, assessing their performance in detecting malicious anomalies across diverse datasets and hosts.

Key findings from this research underscore the importance of including human language for the generation of coherent clusters and a better performance of pretrained language models over models that were fine-tuned or built from scratch. Furthermore, the influence of clustering parameters on cluster quality proves to be crucial for cluster quality. Additionally, we gained insights into how extra contextual features are useful for log analysis.

In light of these findings, the study provides several recommendations for future research, including the expansion of the methodology to accommodate various log sources, the enhancement of preprocessing techniques, the integration of newer and more advanced language models, and the pursuit of efficient hyperparameter optimization. This work contributes to the continual advancement of log-based anomaly detection and its critical role in enhancing cybersecurity practices.