Rethinking log parsing in the context of modern software ecosystems

None, None

Rethinking log parsing in the context of modern software ecosystems

Master Thesis (2022)

Author(s)

S. Petrescu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Jan S. Rellermeyer – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Lydia Y. Chen – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Luís Miranda da Cruz – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Log Analysis Named Entity Recognition Log parsing

To reference this document use

https://resolver.tudelft.nl/uuid:3f9a3b7b-ac70-40f7-bef5-94465f34aaa2

More Info

expand_more

Publication Year

2022

Language

English

Graduation Date

20-07-2022

Awarding Institution

Delft University of Technology

Programme

Computer Science, Software Technology

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

300

Collections

thesis

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Modern systems generate a tremendous amount of data, making manual investigations infeasible, hence requiring automating the process of analysis. However, running automated log analysis pipelines is far from straightforward, due to the changing nature of software ecosystems caused by the constant need to adapt to user requirements. In practice, these are comprised of a series of steps that collectively aim at turning raw logs into actionable insights. The first step is log parsing which aims to abstract away from raw logs toward structured information. Log parsing is paramount, as it influences the performance of all subsequent downstream tasks that rely on its output. Although previous works have investigated the performance of log parsing, given the increase in data heterogeneity witnessed over the past decades, the validity of current estimates is questionable, as there is a lack of understanding of how log parsing methods perform in modern contexts. Consequently, we investigate the field and, in the process, we discover that misleading metrics are adopted, which produce incomplete performance estimates. Furthermore, motivated by an industry use case within the infrastructure of a large international financial institution, we discover that the current log parsing paradigm is not aligned with what is required in practice. Consequently, to address these current limitations, in this work we contribute with the following. We (1) evaluate the field of log parsing, (2) propose a new log parsing paradigm and create a benchmark dataset to facilitate future research, and (3) propose and evaluate a machine learning model that solves log parsing within the new paradigm.

Files

Rethinking_log_parsing_in_the_... (pdf)

(pdf | 18 Mb)

License info not available