Recommending Log Placement Based on Code Vocabulary

More Info
expand_more

Abstract

Logging is a common practice of vital importance that enables developers to collect runtime information from a system. This information is then used to monitor a system's performance as it runs in production and to detect the cause of system failures. Besides its importance, logging is still a manual and difficult process. Developers rely on their experience and domain expertise in order to decide where to put log statements. In this paper, we tried to automatically suggest log placement by treating code as plain text that is derived from a vocabulary. Intuitively, we believe that the Code Vocabulary can indicate whether a code snippet should be logged or not. In order to validate this hypothesis, we trained machine learning models based solely on the Code Vocabulary in order to suggest log placement at method level. We also studied which words of the Code Vocabulary are more important when it comes to deciding where to put log statements. We evaluated our experiments on three open source systems and we found that i) The Code Vocabulary is a great source of training data when it comes to suggesting log placement at method level, ii) Classifiers trained solely on Vocabulary data are hard to interpret as there are no words in the Code Vocabulary significantly more valuable than others.