Deliberate Code Coverage
R.M. de Britto Heemskerk (TU Delft - Electrical Engineering, Mathematics and Computer Science)
S. Proksch – Mentor (TU Delft - Software Engineering)
C.E. Brandt – Mentor (TU Delft - Software Engineering)
M.A. Migut – Graduation committee member (TU Delft - Web Information Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Testing is a major part of software development. Within testing often coverage requirements are used
as a tool for quality assurance. But what code should be covered to reach the requirement is not clear.
To address this, we suggest using historic data to make these decisions more deliberate. In other
words, we want to use machine learning to predict coverage.
Building upon previous research, we investigate how different approaches affect the performance of
decision tree models. We did this using data from the Mozilla Firefox codebase. We focused in partic-
ular on the C/C++ code within there. Naively splitting training and test set and representing coverage
per lines leads to best performance. Analysis showed that grouping coverage data based on basic
blocks slightly lessened the predictive performance of the model. Meanwhile, splitting the data across
the training set and test set based on their files appears to take away all predictive performance.
This study provides a new dataset for use in developer coverage prediction. It also introduces a new
way of representing coverage data for developer coverage prediction, being basic-block coverage. And
finally, gives insights on the effects of different coverage representations on decision trees.