Evaluating the quality of multiple automatically produced segmentation variants of the prostate on Magnetic Resonance Imaging scans for brachytherapy

Journal Article (2025)
Author(s)

Arkadiy Dushatskiy (Centrum Wiskunde & Informatica (CWI))

Peter A.N. Bosman (TU Delft - Algorithmics, Centrum Wiskunde & Informatica (CWI))

Karel A. Hinnen (Universiteit van Amsterdam)

Jan Wiersma (Universiteit van Amsterdam)

Henrike Westerveld (Erasmus Medical Center Cancer Institute)

Bradley R. Pieters (Universiteit van Amsterdam, Cancer Center Amsterdam)

Tanja Alderliesten (Leiden University Medical Center)

Research Group
Algorithmics
DOI related publication
https://doi.org/10.1016/j.phro.2025.100852
More Info
expand_more
Publication Year
2025
Language
English
Research Group
Algorithmics
Volume number
36
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Background and Purpose: Recently, we introduced a novel Deep Learning (DL) based (semi-)automatic method for medical image segmentation. Unlike classical DL segmentation methods, it produces multiple segmentation variants (reflecting the variation of manual segmentations) instead of just one. Potentially, with this approach, there is a higher chance that a clinician prefers one of automatically produced segmentation variants. This work focuses on evaluating this method on prostate segmentation in MRI scans used for brachytherapy and investigating its potential clinical usefulness. Materials and Methods: Three experienced radiation oncologists graded (per-slice and per-scan) segmentations produced by our method, reference segmentations (manually created and used for brachytherapy treatment planning) and segmentations produced by a classical DL method. The study was retrospective and the way the segmentation was generated (our method, classical DL method, or manually) was blinded for the clinicians. The grades reflect the amount of manual correction required. Additionally, the clinicians were asked to rank segmentations to evaluate which one is preferred for each scan. The study was performed on 13 prostate cancer patients. Results: Segmentations produced by our method are graded as requiring no manual correction in 292/576 (51 %) slices compared to 240/576 (42 %) slices in the case of the segmentations produced by a classical DL method. Furthermore, in fewer slices, 38 (6.6 %) vs. 48 (8.3 %), segmentations by our method were graded as unacceptable. Conclusion: Our study has demonstrated that deep-learning-based segmentation methods can produce high-quality segmentations. Our method was evaluated better than a classical DL method, indicating the potential for integration into clinical practice.