Multi-Metric Clinical Validation of an Auto-Contour Refinement Tool in Head-and-Neck Radiotherapy

Master Thesis (2026)
Author(s)

J.L. Scharn (TU Delft - Mechanical Engineering)

Contributor(s)

N. Tümer – Mentor (TU Delft - Mechanical Engineering)

Frank J.W.M. Dankers – Mentor (Leiden University Medical Center)

Prerak Mody – Mentor (Leiden University Medical Center)

Marius Staring – Graduation committee member (Leiden University Medical Center)

Q. Tao – Graduation committee member (TU Delft - Applied Sciences)

Faculty
Mechanical Engineering
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
27-03-2026
Awarding Institution
Delft University of Technology
Programme
Biomedical Engineering, Neuromusculoskeletal Biomechanics
Faculty
Mechanical Engineering
Downloads counter
34
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Background:
Interactive segmentation models combine auto-segmentation methods with user interaction to overcome the inconvenience of manually adjusting contours generated by imperfect auto-contouring models. However, these models have not yet been implemented for tumor target volume segmentation in clinical radiotherapy settings. Therefore, this study validates a previously developed auto-contour refinement tool at the LUMC for Head-and-Neck (H&N) radiotherapy, demonstrating its robustness and trustworthiness.

Methods:
A user study with six non-expert participants was performed, who iteratively refined a contour-refinement model prediction to align as closely as possible with the corresponding ground truth for six tumor volumes from six patients. The contour-refinement model updated its prior predictions based on user-provided foreground (tumor) and background (non-tumor) scribbles. This enabled Three Dimensional (3D) refinement until a satisfactory result was achieved.
User inputs were collected and evaluated using performance metrics such as Dice and Surface Dice to evaluate robustness of the model, along with two newly introduced evaluation metrics proposed in this study to evaluate trustworthiness: local and non-local (Surface) Dice.

Results:
Robust behavior is observed, as the model reacts in a highly consistent manner across all users. Only minor differences in model performance (Delta Dice scores of 0.1407 vs. 0.1296) were observed across users when different user inputs were applied.

The AI pencil yields a strong initial improvement compared to manual annotations (27.4% vs. 6.4%, Wilcoxon p = 0.047), whereas subsequent iterations show variability. This variability was frequently observed in cases of incorrect user input, distortions caused by dental implants, anatomically complex regions, and during the segmentation of slices at the tumor boundaries.
In all other cases the model showed a high trustworthiness, as it follows the users intent during the contouring process.

Conclusion:
The incorporation of user feedback into the contour-refinement model results in a rapid improvement in segmentation quality across the entire volume. However, manual refinement by clinicians remains necessary for anatomically complex slices.
Overall, this research shows that the model is robust to variations in user input and (apart from the first few iterations) there are no spurious changes in non-local areas. These are important findings when working towards clinical adoption of these interactive contour refinement models.

Files

License info not available