Radiotherapy (RT) is a widespread and effective technique to treat cancers by killing cancerous cells with rays of radiation. Building upon advances in image guidance and dose delivery technology like Proton Therapy, Adaptive RT promises more effective tumor decimation and a redu
...
Radiotherapy (RT) is a widespread and effective technique to treat cancers by killing cancerous cells with rays of radiation. Building upon advances in image guidance and dose delivery technology like Proton Therapy, Adaptive RT promises more effective tumor decimation and a reduction of the incidence and severity of side effects. Unfortunately, the clinical implementation of adaptive workflows is challenging due to their resource-intensive nature. Therefore, their successful adoption lingers on overcoming several bottlenecks in the treatment planning process.
In this dissertation, we focus on methods used for the image segmentation or contouring step, which allows the localization of the anatomical structures required for dose optimization and evaluation. Until recently, clinicians had to manually delineate dozens of organs-at-risk and target volumes across hundreds of slices of the patient’s three-dimensional images. A process that is extremely time-consuming. The advent of deep learning-based artificial intelligence (AI) has changed the landscape: a modern auto-segmentation AI can produce segmentations for most of a patient’s anatomy in minutes.
Despite increasing automation in the segmentation process, it remains time and resource-intensive. Due to the segmentations’ criticality for the patient’s outcome and the errors the AI will commit, clinicians must perform a quality assessment of the AI’s outputs. Depending on the case’s complexity, the duration of the quality assessment process can negate the time gains auto-segmentation tools bring.
Deep ensemble AIs represent an advancement in medical image segmentation. Instead of providing a deterministic output, deep ensemble AIs produce a set of plausible candidates that aim to model inter-clinician annotation variability. Consensus segmentations obtained from ensembles tend to be more accurate and robust than the single-prediction deterministic counterpart. Nevertheless, by only using the consensus, a lot of potentially useful information is being discarded.
In this dissertation, we contribute to different phases of the segmentation quality assessment process. We characterize this process and introduce methods that leverage the raw outputs of deep ensemble AIs to support and speed up quality assessment tasks. The methods presented show new ways of analyzing and using ensembles in RT. Nevertheless, since these are relevant outside RT, we keep the presentation of the methods general and evaluate them in other application scenarios, such as the analysis of simulation ensembles or meteorological data.
Before fixing segmentation failures, clinicians must find them. This process can be time-consuming and fatiguing when failures are sparse and spread through the patient’s three-dimensional images. We present and evaluate a delineation error detection system, which guides clinicians to slices of three-dimensional images that contain potentially clinically relevant segmentation failures. We co-designed the DEDS with clinicians and refined it based on an observational study, which allowed us to characterize clinicians’ navigation patterns and the use of information sources like AI uncertainty and patients’ dose distributions. We evaluated the DEDS’ potential to speed up the QA process through a simulation study with a retrospective cohort of patients. Results indicate that speed-ups are the most significant when equipping the DEDS with information sources indicative of clinical priority, which prevents unnecessary edits.
Visual inspection of the segmentation ensemble permits understanding the main trends and detecting anomalies that might indicate segmentation failures. Using a spaghetti plot to visualize all ensemble members is straightforward but prone to clutter. Contour boxplots prevent clutter and extra complexity by distilling essential ensemble information, which permits more efficient ensemble inspection. Nevertheless, they are time-consuming to compute, reducing their practical value. We present Inclusion Depth for contour ensembles. Inclusion Depth yields per ensemble member centrality scores that allow characterizing the distribution of segmentation ensembles in terms of properties like the median, trimmed mean, confidence bands, and outliers. Compared to previous contour depth notions, Inclusion Depth is significantly faster, making it more applicable in practice for time-critical contexts like QA in adaptive RT. We show how Inclusion Depth permits creating contour boxplots for ensembles with hundreds of segmentations in seconds.
It is not uncommon for distinct representative shapes to co-occur within a contour ensemble. With ensembles created by clinicians, for instance, different institutions, training sessions, or experience levels can lead to distinct shapes (i.e., modes of variation) for the same structure. When trained on these data, deep ensemble AIs would yield similarly multimodal ensembles. In quality assessment, being able to extract these representatives would pave the way for new ensemble-based interactive segmentation workflows. Applying traditional contour depth notions to these multi-modal ensembles collapses the existing variation modes and can lead to uninformative centrality scores. To address this issue, we present the first framework for multi-modal contour depth, which also includes notable runtime improvements for depth computation. When used with Inclusion Depth, multi-modal contour depth permits clustering the different modes of variation and determining cluster-dependent scores that appropriately characterize the data. Variation modes can be then independently analyzed using uni-modal depth machinery like contour boxplots. xiii
The global perspective of contour depth methods, which consider the entire volume, may be insufficient when parts of the contours are noisy or when the resolution of the ensemble is too large to process within a reasonable time. Correlation clustering methods provide a solution by partitioning the spatial domain of the ensemble into highly correlated regions that can be used to localize analyses. Existing correlation clustering algorithms do not scale well as the resolution of the ensemble increases. We introduce the Local-to-Global Correlation Clustering (LoGCC) method, which partitions the ensemble’s spatial domain into coarser primitives, representing areas of consistent ensemble member behavior. Unlike previous correlation clustering methods, the proposed LoGCC achieves significantly faster runtimes by leveraging the ensemble’s spatial structure and decoupling computations into local and global steps. Like with Inclusion Depth, these speed gains enable LoGCC to analyze large datasets in time-critical fields such as adaptive radiotherapy (RT).
Throughout this dissertation, our approach focused on designing modular, flexible analysis methods applicable across different tasks and domains. We demonstrate how the delineation error detection system, multi-modal Inclusion Depth, and Local-to-Global Correlation Clustering support quality assessment in RT and extend to fields like meteorology. We also speculate on their potential as foundational elements for more complex workflows. For example, extracted modes of variation, which indicate representative shapes in the ensemble, could be repurposed as an interactive segmentation tool. Alternatively, consistent regions detected by correlation clustering could be used as building blocks to enable localized contour analysis and editing.
We hope the proposed contour ensemble visual analysis methods inspire the development of more efficient analysis workflows that harness ensembles’ power in RT and beyond.