Inferring DFAs from Log Traces Using Community Detection

More Info
expand_more

Abstract

Large software systems today require increasingly complex models of their execution to aid the analysis of their behavior. Such execution models are impractical to compile by hand, and current approaches to their automated generation are either not generalizable or not scalable enough. This paper addresses this problem with a new approach based on the interpretation of log traces. We analyze the effectiveness of using community detection algorithms for generating system execution models from structured datasets of log samples. This approach first models sets of log traces as tree-shaped automata, and then uses graph clustering algorithms to reduce such tree representations down to more concise models. This research focuses on analysing the quality of the generated models in terms of conciseness, accuracy, recall and scalability. Testing was performed on data samples from the XRP network, a blockchain-based payment system. During implementation of a proof of concept, multiple challenges arose which limited the ability of our study to fully evaluate the approach's effectiveness. The partial results obtained show poor performance, both in terms of runtime and in accuracy of generated models. Due to the limitations of the evaluation performed, the results are to be considered exploratory and require further testing.