Deliberate Code Coverage

None, None

Deliberate Code Coverage

Master Thesis (2026)

Author(s)

R.M. de Britto Heemskerk (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

S. Proksch – Mentor (TU Delft - Software Engineering)

C.E. Brandt – Mentor (TU Delft - Software Engineering)

M.A. Migut – Graduation committee member (TU Delft - Web Information Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Machine Learning Software Testing Code Coverage Prediction Developer Coverage Prediction

To reference this document use:

https://resolver.tudelft.nl/uuid:ec89c876-1946-444c-83f8-e25297e6287e

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

22-01-2026

Awarding Institution

Delft University of Technology

Programme

['Computer Science | Software Technology']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Testing is a major part of software development. Within testing often coverage requirements are used
as a tool for quality assurance. But what code should be covered to reach the requirement is not clear.
To address this, we suggest using historic data to make these decisions more deliberate. In other
words, we want to use machine learning to predict coverage.
Building upon previous research, we investigate how different approaches affect the performance of
decision tree models. We did this using data from the Mozilla Firefox codebase. We focused in partic-
ular on the C/C++ code within there. Naively splitting training and test set and representing coverage
per lines leads to best performance. Analysis showed that grouping coverage data based on basic
blocks slightly lessened the predictive performance of the model. Meanwhile, splitting the data across
the training set and test set based on their files appears to take away all predictive performance.
This study provides a new dataset for use in developer coverage prediction. It also introduces a new
way of representing coverage data for developer coverage prediction, being basic-block coverage. And
finally, gives insights on the effects of different coverage representations on decision trees.

Files

Deliberate_Code_Coverage.pdf

(pdf | 0.636 Mb)

License info not available