Should This Code Get Tested?

None, None

Should This Code Get Tested?

A Study into Code and Non-Code Characteristics Leading to Test-Suite Modifications

Master Thesis (2026)

Author(s)

M.S. Boon (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

B. Özkan – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

C.E. Brandt – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

M.A. Migut – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Machine learning Prediction Data mining Co-evolution Test-suite

To reference this document use

https://resolver.tudelft.nl/uuid:695084eb-bd38-4678-9a58-9d350384ac15

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

26-06-2026

Awarding Institution

Delft University of Technology

Programme

Computer Science

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

7

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Software testing is essential for maintaining software quality, yet determining which code changes require corresponding test modifications remains a challenging and time-consuming task. This thesis investigates whether test-suite co-evolution can be predicted from a combination of code-, coverage-, repository-level and non-code project and pull request characteristics and studies which characteristics are most informative in both within-project and cross-project settings.

To answer this question, we constructed a dataset containing 72,534 modified lines extracted from 1,303 pull requests across 18 open-source Java projects. Using coverage information from the modified test suites, we derived line-level co-evolution labels and extracted characteristics from multiple scopes, including line, method, class, repository, pull request and project-level metrics. Several machine learning models were evaluated, with Gradient Boosting achieving the strongest overall performance.

The results show that test-suite modifications can be predicted with moderate accuracy. The best-performing model achieved an average MCC of 0.357 and an AUC of 0.805 in the within-project setting, and a MCC of 0.454 and an AUC of 0.839 in the cross-project setting. Historical coverage was the strongest individual predictor, but repository-level and non-code characteristics provided substantial additional predictive value. Furthermore, broader developmental and repository-level characteristics proved more informative than localized code metrics, and cross-project prediction consistently outperformed within-project prediction.

These findings demonstrate that software repositories contain meaningful signals regarding future testing behavior and that test-suite co-evolution can be predicted using information available during development.

Files

Should_This_Code_Get_Tested.pd... (pdf)

(pdf | 3.72 Mb)

License info not available