Inferring Log-Based Behavioural System Models using Markov Chains

More Info
expand_more

Abstract

An essential step of software development is obtaining an understanding of the behaviour of a system. Accurate state models of system behaviour might help software developers build such an understanding. There exist several techniques for automatically inferring models on system behaviour using log analysis, but these do not scale well for systems that produce large amounts of logs.
In this study, we present an approach for inferring concise state models of system behaviour using log analysis, called MASM (Markov Algorithm for inferring concise State Models), which implements a Markov chain. We argue this approach has the potential for higher scalability than existing techniques, as it attempts to approximate a state model by exploiting the properties of Markov chains. This is achieved by considering a sequence of log statements as a stochastic process, which enables MASM to infer a naive state model from a set of log statements, which is then minimized using the properties of Markov chains.
We evaluated MASM in an empirical study on the XRP Ledger Consensus Protocol. In this empirical study, MASM was evaluated on accuracy at different compression rates, and scalability. The results indicate that MASM shows several knee points in terms of accuracy, suggesting several optimal compression rates. We also found that the run-time of MASM scales linearly with the size of the dataset. Finally, we discovered that MASM is unable to outperform a random compression algorithm, which compresses the model by randomly merging states, in terms of accuracy. However, further research is required on measuring the readability of the produced models to derive definitive conclusions on the performance of MASM compared to a random compression algorithm.