Recommending Log Placement Based on Code Vocabulary
K. Lyrakis (TU Delft - Electrical Engineering, Mathematics and Computer Science)
J. Cândido – Mentor (TU Delft - Software Engineering)
Mauricio Aniche – Mentor (TU Delft - Software Engineering)
A Katsifodimos – Graduation committee member (TU Delft - Web Information Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Logging is a common practice of vital importance that enables developers to collect runtime information from a system. This information is then used to monitor a system's performance as it runs in production and to detect the cause of system failures. Besides its importance, logging is still a manual and difficult process. Developers rely on their experience and domain expertise in order to decide where to put log statements. In this paper, we tried to automatically suggest log placement by treating code as plain text that is derived from a vocabulary. Intuitively, we believe that the Code Vocabulary can indicate whether a code snippet should be logged or not. In order to validate this hypothesis, we trained machine learning models based solely on the Code Vocabulary in order to suggest log placement at method level. We also studied which words of the Code Vocabulary are more important when it comes to deciding where to put log statements. We evaluated our experiments on three open source systems and we found that i) The Code Vocabulary is a great source of training data when it comes to suggesting log placement at method level, ii) Classifiers trained solely on Vocabulary data are hard to interpret as there are no words in the Code Vocabulary significantly more valuable than others.