Inferring Log-Based Behavioural System Models using Markov Chains

Bachelor Thesis (2021)
Author(s)

T.A.K. Werthenbach (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Mitchell Olsthoorn – Mentor (TU Delft - Software Engineering)

Annibale Panichella – Mentor (TU Delft - Software Engineering)

P Przemysław – Graduation committee member (TU Delft - Embedded Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2021 Thomas Werthenbach
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Thomas Werthenbach
Graduation Date
02-07-2021
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

An essential step of software development is obtaining an understanding of the behaviour of a system. Accurate state models of system behaviour might help software developers build such an understanding. There exist several techniques for automatically inferring models on system behaviour using log analysis, but these do not scale well for systems that produce large amounts of logs.
In this study, we present an approach for inferring concise state models of system behaviour using log analysis, called MASM (Markov Algorithm for inferring concise State Models), which implements a Markov chain. We argue this approach has the potential for higher scalability than existing techniques, as it attempts to approximate a state model by exploiting the properties of Markov chains. This is achieved by considering a sequence of log statements as a stochastic process, which enables MASM to infer a naive state model from a set of log statements, which is then minimized using the properties of Markov chains.
We evaluated MASM in an empirical study on the XRP Ledger Consensus Protocol. In this empirical study, MASM was evaluated on accuracy at different compression rates, and scalability. The results indicate that MASM shows several knee points in terms of accuracy, suggesting several optimal compression rates. We also found that the run-time of MASM scales linearly with the size of the dataset. Finally, we discovered that MASM is unable to outperform a random compression algorithm, which compresses the model by randomly merging states, in terms of accuracy. However, further research is required on measuring the readability of the produced models to derive definitive conclusions on the performance of MASM compared to a random compression algorithm.

Files

CSE3000_RP_Final_paper.pdf
(pdf | 0.66 Mb)
License info not available