Nicolas F. Chaves-de-Plaza | TU Delft Repository

Visual Analysis of Contour Ensembles in the Context of Radiotherapy

Doctoral thesis (2025) - Nicolas F. Chaves-de-Plaza (author) , K.A. Hildebrandt (promotor) , R. van Egmond (promotor) , A. Vilanova Bartroli (promotor)

Radiotherapy (RT) is a widespread and effective technique to treat cancers by killing cancerous cells with rays of radiation. Building upon advances in image guidance and dose delivery technology like Proton Therapy, Adaptive RT promises more effective tumor decimation and a redu ...

Radiotherapy (RT) is a widespread and effective technique to treat cancers by killing cancerous cells with rays of radiation. Building upon advances in image guidance and dose delivery technology like Proton Therapy, Adaptive RT promises more effective tumor decimation and a reduction of the incidence and severity of side effects. Unfortunately, the clinical implementation of adaptive workflows is challenging due to their resource-intensive nature. Therefore, their successful adoption lingers on overcoming several bottlenecks in the treatment planning process.

In this dissertation, we focus on methods used for the image segmentation or contouring step, which allows the localization of the anatomical structures required for dose optimization and evaluation. Until recently, clinicians had to manually delineate dozens of organs-at-risk and target volumes across hundreds of slices of the patient’s three-dimensional images. A process that is extremely time-consuming. The advent of deep learning-based artificial intelligence (AI) has changed the landscape: a modern auto-segmentation AI can produce segmentations for most of a patient’s anatomy in minutes.

Despite increasing automation in the segmentation process, it remains time and resource-intensive. Due to the segmentations’ criticality for the patient’s outcome and the errors the AI will commit, clinicians must perform a quality assessment of the AI’s outputs. Depending on the case’s complexity, the duration of the quality assessment process can negate the time gains auto-segmentation tools bring.

Deep ensemble AIs represent an advancement in medical image segmentation. Instead of providing a deterministic output, deep ensemble AIs produce a set of plausible candidates that aim to model inter-clinician annotation variability. Consensus segmentations obtained from ensembles tend to be more accurate and robust than the single-prediction deterministic counterpart. Nevertheless, by only using the consensus, a lot of potentially useful information is being discarded.

In this dissertation, we contribute to different phases of the segmentation quality assessment process. We characterize this process and introduce methods that leverage the raw outputs of deep ensemble AIs to support and speed up quality assessment tasks. The methods presented show new ways of analyzing and using ensembles in RT. Nevertheless, since these are relevant outside RT, we keep the presentation of the methods general and evaluate them in other application scenarios, such as the analysis of simulation ensembles or meteorological data.

Before fixing segmentation failures, clinicians must find them. This process can be time-consuming and fatiguing when failures are sparse and spread through the patient’s three-dimensional images. We present and evaluate a delineation error detection system, which guides clinicians to slices of three-dimensional images that contain potentially clinically relevant segmentation failures. We co-designed the DEDS with clinicians and refined it based on an observational study, which allowed us to characterize clinicians’ navigation patterns and the use of information sources like AI uncertainty and patients’ dose distributions. We evaluated the DEDS’ potential to speed up the QA process through a simulation study with a retrospective cohort of patients. Results indicate that speed-ups are the most significant when equipping the DEDS with information sources indicative of clinical priority, which prevents unnecessary edits.

Visual inspection of the segmentation ensemble permits understanding the main trends and detecting anomalies that might indicate segmentation failures. Using a spaghetti plot to visualize all ensemble members is straightforward but prone to clutter. Contour boxplots prevent clutter and extra complexity by distilling essential ensemble information, which permits more efficient ensemble inspection. Nevertheless, they are time-consuming to compute, reducing their practical value. We present Inclusion Depth for contour ensembles. Inclusion Depth yields per ensemble member centrality scores that allow characterizing the distribution of segmentation ensembles in terms of properties like the median, trimmed mean, confidence bands, and outliers. Compared to previous contour depth notions, Inclusion Depth is significantly faster, making it more applicable in practice for time-critical contexts like QA in adaptive RT. We show how Inclusion Depth permits creating contour boxplots for ensembles with hundreds of segmentations in seconds.

It is not uncommon for distinct representative shapes to co-occur within a contour ensemble. With ensembles created by clinicians, for instance, different institutions, training sessions, or experience levels can lead to distinct shapes (i.e., modes of variation) for the same structure. When trained on these data, deep ensemble AIs would yield similarly multimodal ensembles. In quality assessment, being able to extract these representatives would pave the way for new ensemble-based interactive segmentation workflows. Applying traditional contour depth notions to these multi-modal ensembles collapses the existing variation modes and can lead to uninformative centrality scores. To address this issue, we present the first framework for multi-modal contour depth, which also includes notable runtime improvements for depth computation. When used with Inclusion Depth, multi-modal contour depth permits clustering the different modes of variation and determining cluster-dependent scores that appropriately characterize the data. Variation modes can be then independently analyzed using uni-modal depth machinery like contour boxplots. xiii

The global perspective of contour depth methods, which consider the entire volume, may be insufficient when parts of the contours are noisy or when the resolution of the ensemble is too large to process within a reasonable time. Correlation clustering methods provide a solution by partitioning the spatial domain of the ensemble into highly correlated regions that can be used to localize analyses. Existing correlation clustering algorithms do not scale well as the resolution of the ensemble increases. We introduce the Local-to-Global Correlation Clustering (LoGCC) method, which partitions the ensemble’s spatial domain into coarser primitives, representing areas of consistent ensemble member behavior. Unlike previous correlation clustering methods, the proposed LoGCC achieves significantly faster runtimes by leveraging the ensemble’s spatial structure and decoupling computations into local and global steps. Like with Inclusion Depth, these speed gains enable LoGCC to analyze large datasets in time-critical fields such as adaptive radiotherapy (RT).

Throughout this dissertation, our approach focused on designing modular, flexible analysis methods applicable across different tasks and domains. We demonstrate how the delineation error detection system, multi-modal Inclusion Depth, and Local-to-Global Correlation Clustering support quality assessment in RT and extend to fields like meteorology. We also speculate on their potential as foundational elements for more complex workflows. For example, extracted modes of variation, which indicate representative shapes in the ensemble, could be repurposed as an interactive segmentation tool. Alternatively, consistent regions detected by correlation clustering could be used as building blocks to enable localized contour analysis and editing.

We hope the proposed contour ensemble visual analysis methods inspire the development of more efficient analysis workflows that harness ensembles’ power in RT and beyond.

Large-scale dose evaluation of deep learning organ contours in head-and-neck radiotherapy by leveraging existing plans

Journal article (2024) - P. Mody (author) , Merle Huiskes (author) , Nicolas F. Chaves-de-Plaza (author) , Alice Onderwater (author) , Rense Lamsma (author) , K.A. Hildebrandt (author) , Nienke Hoekstra (author) , Eleftheria Astreinidou (author) , M. Staring (author) , More Authors (author)

Background and purpose: Retrospective dose evaluation for organ-at-risk auto-contours has previously used small cohorts due to additional manual effort required for treatment planning on auto-contours. We aimed to do this at large scale, by a) proposing and assessing an automated ...

Accelerating Hyperbolic t-SNE

Journal article (2024) - M. Skrodzki (author) , Hunter van Geffen (author) , Nicolas F. Chaves-de-Plaza (author) , T. Höllt (author) , E. Eisemann (author) , K.A. Hildebrandt (author)

The need to understand the structure of hierarchical or high-dimensional data is present in a variety of fields. Hyperbolic spaces have proven to be an important tool for embedding computations and analysis tasks as their non-linear nature lends itself well to tree or graph data. ...

Implementation of delineation error detection systems in time-critical radiotherapy

Do AI-supported optimization and human preferences meet?

Journal article (2024) - Nicolas F. Chaves-de-Plaza (author) , Prerak Mody (author) , K.A. Hildebrandt (author) , Marius Staring (author) , Eleftheria Astreinidou (author) , Mischa de Ridder (author) , H. de Ridder (author) , Anna Vilanova (author) , R. van Egmond (author)

Artificial Intelligence (AI)-based auto-delineation technologies rapidly delineate multiple structures of interest like organs-at-risk and tumors in 3D medical images, reducing personnel load and facilitating time-critical therapies. Despite its accuracy, the AI may produce flawe ...

Inclusion Depth for Contour Ensembles

Journal article (2024) - Nicolas F. Chaves-de-Plaza (author) , P. Mody (author) , M. Staring (author) , R. van Egmond (author) , Anna Vilanova (author) , K.A. Hildebrandt (author)

Ensembles of contours arise in various applications like simulation, computer-Aided design, and semantic segmentation. Uncovering ensemble patterns and analyzing individual members is a challenging task that suffers from clutter. Ensemble statistical summarization can alleviate t ...

Implementation of Delineation Error Detection Systems in Clinical Practice: Do AI-Supported Optimization and Human Preferences Meet?

Preprint (2023) - Nicolas F. Chaves-de-Plaza (author) , P. Mody (author) , K.A. Hildebrandt (author) , M. Staring (author) , Eleftheria Astreinidou (author) , Mischa de Ridder (author) , H. de Ridder (author) , A. Vilanova Bartroli (author) , R. van Egmond (author)

Artificial Intelligence (AI)-based auto-delineation technologies rapidly delineate multiple structures of interest like organs-at-risk and tumors in 3D medical images, reducing personnel load and facilitating time-critical therapies. Despite its accuracy, the AI may produce flawe ...

Implementation of Delineation Error Detection Systems in Clinical Practice: Do AI-Supported Optimization and Human Preferences Meet?

Poster (2023) - Nicolas F. Chaves-de-Plaza (author)

Navigating Perplexity: A linear relationship with the data set size in t-SNE embeddings

Other (2023) - M. Skrodzki (author) , Nicolas F. Chaves-de-Plaza (author) , K.A. Hildebrandt (author) , T. Höllt (author) , E. Eisemann (author)

Widely used pipelines for analyzing high-dimensional data utilize two-dimensional visualizations. These are created, for instance, via t-distributed stochastic neighbor embedding (t-SNE). A crucial element of the t-SNE embedding procedure is the perplexity hyperparameter. That is ...

Towards fast human-centred contouring workflows for adaptive external beam radiotherapy

Book chapter (2022) - Nicolas F. Chaves-de-Plaza (author) , P. Mody (author) , K.A. Hildebrandt (author) , M. Staring (author) , Eleftheria Astreinidou (author) , Mischa de Ridder (author) , H. de Ridder (author) , R. van Egmond (author)

Delineation of tumours and organs-at-risk permits detecting and correcting changes in the patients' anatomy throughout the treatment, making it a core step of adaptive external beam radiotherapy. Although auto-contouring technologies have sped up this process, the time needed to ...

Report on AI-Infused Contouring Workflows for Adaptive Proton Therapy in the Head and Neck

Preprint (2022) - Nicolas F. Chaves-de-Plaza (author) , P. Mody (author) , K.A. Hildebrandt (author) , M. Staring (author) , E. Astreinidou (author) , M. de Ridder (author) , H. de Ridder (author) , R. van Egmond (author)

Delineation of tumors and organs-at-risk permits detecting and correcting changes in the patients' anatomy throughout the treatment, making it a core step of adaptive proton therapy (APT). Although AI-based auto-contouring technologies have sped up this process, the time needed t ...

Comparing Bayesian models for organ contouring in head and neck radiotherapy

Journal article (2022) - Prerak Mody (author) , Nicolas F. Chaves-de-Plaza (author) , K.A. Hildebrandt (author) , R. van Egmond (author) , H. de Ridder (author) , Marius Staring (author)

Deep learning models for organ contouring in radiotherapy are poised for clinical usage, but currently, there exist few tools for automated quality assessment (QA) of the predicted contours. Bayesian models and their associated uncertainty, can potentially automate the process of ...

ProtoFold Neighborhood Inspector

Conference paper (2022) - Nicolas F. Chaves-de-Plaza (author) , K.A. Hildebrandt (author) , A. Vilanova Bartroli (author)

Post-translational modifications (PTMs) affecting a protein's residues (amino acids) can disturb its function, leading to illness. Whether or not a PTM is pathogenic depends on its type and the status of neighboring residues. In this paper, we present the ProtoFold Neighborhood I ...

Improving Error Detection in Deep Learning Based Radiotherapy Autocontouring Using Bayesian Uncertainty

Conference paper (2022) - Prerak Mody (author) , Nicolas F. Chaves-de-Plaza (author) , K.A. Hildebrandt (author) , M. Staring (author)

Bayesian Neural Nets (BNN) are increasingly used for robust organ auto-contouring. Uncertainty heatmaps extracted from BNNs have been shown to correspond to inaccurate regions. To help speed up the mandatory quality assessment (QA) of contours in radiotherapy, these heatmaps coul ...