Recommending Log Placement Based on Code Vocabulary

Bachelor Thesis (2021)
Author(s)

K. Lyrakis (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J. Cândido – Mentor (TU Delft - Software Engineering)

Mauricio Aniche – Mentor (TU Delft - Software Engineering)

A Katsifodimos – Graduation committee member (TU Delft - Web Information Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2021 Kostas Lyrakis
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Kostas Lyrakis
Graduation Date
02-07-2021
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Logging is a common practice of vital importance that enables developers to collect runtime information from a system. This information is then used to monitor a system's performance as it runs in production and to detect the cause of system failures. Besides its importance, logging is still a manual and difficult process. Developers rely on their experience and domain expertise in order to decide where to put log statements. In this paper, we tried to automatically suggest log placement by treating code as plain text that is derived from a vocabulary. Intuitively, we believe that the Code Vocabulary can indicate whether a code snippet should be logged or not. In order to validate this hypothesis, we trained machine learning models based solely on the Code Vocabulary in order to suggest log placement at method level. We also studied which words of the Code Vocabulary are more important when it comes to deciding where to put log statements. We evaluated our experiments on three open source systems and we found that i) The Code Vocabulary is a great source of training data when it comes to suggesting log placement at method level, ii) Classifiers trained solely on Vocabulary data are hard to interpret as there are no words in the Code Vocabulary significantly more valuable than others.

Files

Log_recommendation_paper.pdf
(pdf | 0.283 Mb)
License info not available