A Search-based Approach for Accurate Identification of Log Message Formats

Conference Paper (2018)
Author(s)

Salma Messaoudi (Université du Luxembourg)

Annibale Panichella (TU Delft - Software Engineering, Université du Luxembourg)

Domenico Bianculli (Université du Luxembourg)

Lionel Briand (Université du Luxembourg)

Raimondas Sasnauskas (SES S.A.)

Research Group
Software Engineering
Copyright
© 2018 Salma Messaoudi, A. Panichella, Domenico Bianculli, Lionel Briand, Raimondas Sasnauskas
DOI related publication
https://doi.org/10.1145/3196321.3196340
More Info
expand_more
Publication Year
2018
Language
English
Copyright
© 2018 Salma Messaoudi, A. Panichella, Domenico Bianculli, Lionel Briand, Raimondas Sasnauskas
Research Group
Software Engineering
Pages (from-to)
167-177
ISBN (electronic)
978-1-4503-5714-2
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Many software engineering activities process the events contained in log files. However, before performing any processing activity, it is necessary to parse the entries in a log file, to retrieve the actual events recorded in the log. Each event is denoted by a log message, which is composed of a fixed part—called (event) template—that is the same for all occurrences of the same event type, and a variable part, which may vary with each event occurrence. The formats of log messages, in complex and evolving systems, have numerous variations, are typically not entirely known, and change on a frequent basis; therefore, they need to be identified automatically. The log message format identification problem deals with the identification of the different templates used in the messages of a log. Any solution to this problem has to generate templates that meet two main goals: generating templates that are not too general, so as to distinguish different events, but also not too specific, so as not to consider different occurrences of the same event as following different templates; however, these goals are conflicting. In this paper, we present the approach approach, which recasts the log message identification problem as a multi-objective problem. approach uses an evolutionary approach to solve this problem, by tailoring the NSGA-II algorithm to search the space of solutions for a Pareto optimal set of message templates. We have implemented approach in a tool, which we have evaluated on six real-world datasets, containing log files with a number of entries ranging from 2K to 300K. The experiments results show that approach extracts by far the highest number of correct log message templates, significantly outperforming two state-of-the-art approaches on all datasets.

Files

Main.pdf
(pdf | 0.845 Mb)
License info not available