A large-scale evaluation of tracing back log data to its origin with static analysis

More Info
expand_more

Abstract

Logs are widely used as source of information to understand the activity of computer systems and to monitor their health and stability. As large-scale systems generate hundreds of millions of logs per hour reaching tens of terabytes, automated techniques exist to take advantage of the rich information present in logs. However, these techniques require the link to the event that generated the log, the log statement in the source code.
Several solutions have been proposed to solve this non-trivial challenge, of these the approach based on static analysis reaches the highest accuracy. Log statements in the source code are statically analysed to extract templates and match log messages to these templates, creating a link between log messages and statements. However, no evaluation has been performed in large scale environments of various industries where log messages are versatile.
We perform a field study of the approach based on static analysis of source code to relate log messages to its log statement in a large-scale environment. The approach is evaluated on a rich and versatile dataset of logs produced by over thirty thousand log statements, reaching an accuracy of 97,6%. We provide an non-intrusive, adaptable to custom logging practices and easily extendable implementation that is ready to be used in large-scale system, which allows for automated log analysis techniques to be adopted.