Comparing Bayesian models for organ contouring in head and neck radiotherapy

None, None; None, None; None, None; None, None; None, None; None, None

Comparing Bayesian models for organ contouring in head and neck radiotherapy

Journal Article (2022)

Author(s)

Prerak Mody (Leiden University Medical Center)

Nicolas F. Chaves-de-Plaza (TU Delft - Computer Graphics and Visualisation)

K.A. Hildebrandt (TU Delft - Computer Graphics and Visualisation)

R. van Egmond (TU Delft - Human Technology Relations)

H. de Ridder (TU Delft - Human Technology Relations)

Marius Staring (Leiden University Medical Center)

Research Group

Computer Graphics and Visualisation

Copyright

DOI related publication

https://doi.org/10.1117/12.2611083

Segmentation Entropy Radiotherapy Uncertainty Bayesian Deep Learni DropOut FlipOut

To reference this document use:

https://resolver.tudelft.nl/uuid:e54235ed-f1c9-4edc-afe3-a2a11a70a721

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Research Group

Computer Graphics and Visualisation

Volume number

12032

Pages (from-to)

120320F-1 - 120320F-10

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Deep learning models for organ contouring in radiotherapy are poised for clinical usage, but currently, there exist few tools for automated quality assessment (QA) of the predicted contours. Bayesian models and their associated uncertainty, can potentially automate the process of detecting inaccurate predictions. We investigate two Bayesian models for auto-contouring, DropOut and FlipOut, using a quantitative measure – expected calibration error (ECE) and a qualitative measure – region-based accuracy-vs-uncertainty (R-AvU) graphs. It is well understood that a model should have low ECE to be considered trustworthy. However, in a QA context, a model should also have high uncertainty in inaccurate regions and low uncertainty in accurate regions. Such behaviour could direct visual attention of expert users to potentially inaccurate regions, leading to a speed-up in the QA process. Using R-AvU graphs, we qualitatively compare the behaviour of different models in accurate and inaccurate regions. Experiments are conducted on the MICCAI2015 Head and Neck Segmentation Challenge and on the DeepMindTCIA CT dataset using three models: DropOut-DICE, Dropout-CE (Cross Entropy) and FlipOut-CE. Quantitative results show that DropOut-DICE has the highest ECE, while Dropout-CE and FlipOut-CE have the lowest ECE. To better understand the difference between DropOut-CE and FlipOut-CE, we use the R-AvU graph which shows that FlipOut-CE has better uncertainty coverage in inaccurate regions than DropOut-CE. Such a combination of quantitative and qualitative metrics explores a new approach that helps to select which model can be deployed as a QA tool in clinical settings.

Files

120320F.pdf

(pdf | 1.59 Mb)

License info not available