Logs to the Rescue

Creating meaningful representations from log files for Anomaly Detection

Master thesis (2023)

Authors

G.H.R. Timmerman Electrical Engineering, Mathematics and Computer Science

Contributors

S.E. Verwer Cyber Security - (mentor)

A. Anand Web Information Systems - (graduation committee member)

T. Mulder (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Clustering Anomaly Detection Cybersecurity Language Models Incident Response

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:969ea35f-7cae-49fe-86de-f89af8835177

Published Date

27-09-2023

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

This thesis offers a comprehensive exploration of log-based anomaly detection within the domain of cybersecurity incident response. The research describes a different approach and explores relevant log features for language model training, experimentation with different language models and training methodologies, and the investigation of the potential contribution of extra contextual features. The newly proposed approach is compared against an already implemented baseline in a finite-state classifier called FlexFringe, assessing their performance in detecting malicious anomalies across diverse datasets and hosts.

Key findings from this research underscore the importance of including human language for the generation of coherent clusters and a better performance of pretrained language models over models that were fine-tuned or built from scratch. Furthermore, the influence of clustering parameters on cluster quality proves to be crucial for cluster quality. Additionally, we gained insights into how extra contextual features are useful for log analysis.

In light of these findings, the study provides several recommendations for future research, including the expansion of the methodology to accommodate various log sources, the enhancement of preprocessing techniques, the integration of newer and more advanced language models, and the pursuit of efficient hyperparameter optimization. This work contributes to the continual advancement of log-based anomaly detection and its critical role in enhancing cybersecurity practices.

Files

Master_Thesis_TUDelft_Logs_to_... (.pdf)

(.pdf | 2.98 Mb)