MB

M.S. Boon

info

Please Note

2 records found

A Study into Code and Non-Code Characteristics Leading to Test-Suite Modifications

Master thesis (2026) - M.S. Boon, B. Özkan, C.E. Brandt, M.A. Migut
Software testing is essential for maintaining software quality, yet determining which code changes require corresponding test modifications remains a challenging and time-consuming task. This thesis investigates whether test-suite co-evolution can be predicted from a combination of code-, coverage-, repository-level and non-code project and pull request characteristics and studies which characteristics are most informative in both within-project and cross-project settings.

To answer this question, we constructed a dataset containing 72,534 modified lines extracted from 1,303 pull requests across 18 open-source Java projects. Using coverage information from the modified test suites, we derived line-level co-evolution labels and extracted characteristics from multiple scopes, including line, method, class, repository, pull request and project-level metrics. Several machine learning models were evaluated, with Gradient Boosting achieving the strongest overall performance.

The results show that test-suite modifications can be predicted with moderate accuracy. The best-performing model achieved an average MCC of 0.357 and an AUC of 0.805 in the within-project setting, and a MCC of 0.454 and an AUC of 0.839 in the cross-project setting. Historical coverage was the strongest individual predictor, but repository-level and non-code characteristics provided substantial additional predictive value. Furthermore, broader developmental and repository-level characteristics proved more informative than localized code metrics, and cross-project prediction consistently outperformed within-project prediction.

These findings demonstrate that software repositories contain meaningful signals regarding future testing behavior and that test-suite co-evolution can be predicted using information available during development. ...

Studying the effects of GC-correction and MAPQ filtering on fragmentomics analysis when using short/long ratios

Bachelor thesis (2024) - M.S. Boon, Daan Hazelaar, Bram Pronk, Stavros Makrodimitris, Marcel Reinders
Cancer is one of the leading causes of death. To reduce the amount of deaths caused by cancer, a number of different screening methods are used to detect cancer in an earlier stage, to improve sur vival rates when treating patients with cancer. Cur rent screening methods are often invasive, costly and not very accurate. Therefore, new methods are being sought that aim to be cheaper, less in vasive and provide more accurate results. One of these methods is fragmentomics. Multiple methods have been proposed to use fragmentomics analy sis in the context of screening for cancer, includ ing using the short/long ratio as well as investigat ing the nucleotides at the ends of the fragments. Across previous works using fragmentomics anal ysis to predict cancer, different pre-proccessing steps are used, with limited explanation why the pre-processing methods were chosen. Research into the effects of pre-processing steps used when using fragmentomics analysis is lacking. Two main pre-processing steps in the field are correct ing GC-bias and filtering on MAPQ. Here we in vestigated the impact of three GC-correction meth ods by applying the correction method and then analyzing the resulting fragmentation profiles us ing short/long fragment ratios. Furthermore, three different MAPQ filtering thresholds were studied. This showed that Deeptools correction of the GC bias lowered performance, with the accuracy drop ping from 77.8% to 69.4%. Applying LOESS cor rection using all fragments at the same time re sulted in an accuracy of 83.3%, while applying LOESS correction using the short and long frag ments separately resulted in an accuracy of 91.7%. The impact of filtering the data based on mapping quality was determined by comparing the results of analysing all fragments, analyzing only fragments with mapping quality 5, 20 or 30. This showed that not filtering by mapping quality has a big impact on the profiles of cancer samples, with a KS-test statistic of 0.08 for MAPQ 5 and MAPQ 20 and larger differences in correlations between healthy and cancer samples. The performance of classi fication was much higher when not filtering, with an accuracy of 97.3%, which dropped whenever the filtering threshold was raised, bottoming out at 62.7% for a threshold of MAPQ 30. Due to limita tions with the study, the combined pre-processing of not filtering on MAPQ and using the LOESS separate correction were not studied. ...