O. Pastor Serrano | TU Delft Repository

Artificial Intelligence in Radiotherapy

Probabilistic Deep Learning for Dose Prediction and Anatomy Modeling

Doctoral thesis (2023) - O. Pastor Serrano, M.S. Hoogeman, D.R. Schaart, Z. Perko

This thesis addresses two major challenges in modern radiotherapy workflows: the slow computation speed of dose prediction algorithms and the insufficient modeling of anatomical variations during and between treatment fractions. Current photon and proton therapy plans rely on pre-treatment computed tomography (CT) scans obtained days before the start of treatment. Inter-fraction anatomical changes, intra-fraction organ motion, and setup errors compromise treatment accuracy and may unnecessarily irradiate healthy tissue. Existing mitigation strategies—such as target margins in photon therapy and robust optimization in proton therapy—only partially address these uncertainties and are limited by the lack of realistic anatomical models and fast dose prediction methods.

The first part of this work presents millisecond-scale dose prediction algorithms for proton pencil beams and photon beams using deep learning. Chapter 2 introduces the Dose Transformer Algorithm (DoTA), a model that predicts proton beamlet doses by combining convolutional neural networks with a transformer backbone that captures both spatial features and beam energy information. DoTA achieves gamma pass rates above 99% while reducing computation time by four orders of magnitude compared to Monte Carlo simulations. Chapter 3 extends this approach to photons with the improved Dose Transformer Algorithm (iDoTA), which maps projected beam geometries to 3D dose distributions. iDoTA estimates full VMAT dose distributions in seconds with state-of-the-art accuracy, significantly accelerating conventional photon treatment planning.

The second part focuses on anatomical variations. Chapter 4 presents the Daily Anatomy Model (DAM), a probabilistic deep learning framework that generates patient-specific inter-fraction deformations of planning CT images based on population data. DAM captures correlated movements with few latent variables, accurately reproducing prostate volume and center-of-mass variations observed in repeat CT scans, and enabling robust treatment planning against daily anatomical changes. Chapter 5 models intra-fraction respiratory motion using variational and adversarial autoencoders, including a semi-supervised extension for joint signal classification and generation. A novel time-series compression method reduces multi-dimensional breathing cycles to low-dimensional vectors while preserving high-resolution reconstruction. These models generate realistic, class-specific breathing signals, supporting simulation of target motion during radiation delivery.

Chapter 6 applies these anatomical models to simulate interplay effects in Intensity Modulated Proton Therapy (IMPT), arising from interactions between tumor motion and scanning beam movement. Using both simple sinusoidal and deep learning-generated breathing signals, the analysis quantifies how small variations in respiratory period affect local dose distributions. The results highlight that conventional planning approaches, including 4DCT and Internal Target Volume (ITV) plans, often fail to achieve clinically required robustness, underscoring the need for individualized modeling.

In conclusion, this thesis provides methods to predict dose deposition with millisecond speed and simulate realistic anatomical variations for both inter- and intra-fraction motion. These contributions enable more accurate robustness evaluation, support future online adaptive workflows, and offer a foundation for integrating deep learning-based dose and anatomy models into clinical radiotherapy. Future research should focus on coupling these algorithms with existing treatment planning systems and validating their performance in diverse clinical scenarios. ...

This thesis addresses two major challenges in modern radiotherapy workflows: the slow computation speed of dose prediction algorithms and the insufficient modeling of anatomical variations during and between treatment fractions. Current photon and proton therapy plans rely on pre-treatment computed tomography (CT) scans obtained days before the start of treatment. Inter-fraction anatomical changes, intra-fraction organ motion, and setup errors compromise treatment accuracy and may unnecessarily irradiate healthy tissue. Existing mitigation strategies—such as target margins in photon therapy and robust optimization in proton therapy—only partially address these uncertainties and are limited by the lack of realistic anatomical models and fast dose prediction methods.

The first part of this work presents millisecond-scale dose prediction algorithms for proton pencil beams and photon beams using deep learning. Chapter 2 introduces the Dose Transformer Algorithm (DoTA), a model that predicts proton beamlet doses by combining convolutional neural networks with a transformer backbone that captures both spatial features and beam energy information. DoTA achieves gamma pass rates above 99% while reducing computation time by four orders of magnitude compared to Monte Carlo simulations. Chapter 3 extends this approach to photons with the improved Dose Transformer Algorithm (iDoTA), which maps projected beam geometries to 3D dose distributions. iDoTA estimates full VMAT dose distributions in seconds with state-of-the-art accuracy, significantly accelerating conventional photon treatment planning.

The second part focuses on anatomical variations. Chapter 4 presents the Daily Anatomy Model (DAM), a probabilistic deep learning framework that generates patient-specific inter-fraction deformations of planning CT images based on population data. DAM captures correlated movements with few latent variables, accurately reproducing prostate volume and center-of-mass variations observed in repeat CT scans, and enabling robust treatment planning against daily anatomical changes. Chapter 5 models intra-fraction respiratory motion using variational and adversarial autoencoders, including a semi-supervised extension for joint signal classification and generation. A novel time-series compression method reduces multi-dimensional breathing cycles to low-dimensional vectors while preserving high-resolution reconstruction. These models generate realistic, class-specific breathing signals, supporting simulation of target motion during radiation delivery.

Chapter 6 applies these anatomical models to simulate interplay effects in Intensity Modulated Proton Therapy (IMPT), arising from interactions between tumor motion and scanning beam movement. Using both simple sinusoidal and deep learning-generated breathing signals, the analysis quantifies how small variations in respiratory period affect local dose distributions. The results highlight that conventional planning approaches, including 4DCT and Internal Target Volume (ITV) plans, often fail to achieve clinically required robustness, underscoring the need for individualized modeling.

In conclusion, this thesis provides methods to predict dose deposition with millisecond speed and simulate realistic anatomical variations for both inter- and intra-fraction motion. These contributions enable more accurate robustness evaluation, support future online adaptive workflows, and offer a foundation for integrating deep learning-based dose and anatomy models into clinical radiotherapy. Future research should focus on coupling these algorithms with existing treatment planning systems and validating their performance in diverse clinical scenarios.

Sub-second photon dose prediction via transformer neural networks

Journal article (2023) - Oscar Pastor-Serrano, Peng Dong, Charles Huang, Lei Xing, Zoltán Perkó

Background: Fast dose calculation is critical for online and real-time adaptive therapy workflows. While modern physics-based dose algorithms must compromise accuracy to achieve low computation times, deep learning models can potentially perform dose prediction tasks with both high fidelity and speed. Purpose: We present a deep learning algorithm that, exploiting synergies between transformer and convolutional layers, accurately predicts broad photon beam dose distributions in few milliseconds. Methods: The proposed improved Dose Transformer Algorithm (iDoTA) maps arbitrary patient geometries and beam information (in the form of a 3D projected shape resulting from a simple ray tracing calculation) to their corresponding 3D dose distribution. Treating the 3D CT input and dose output volumes as a sequence of 2D slices along the direction of the photon beam, iDoTA solves the dose prediction task as sequence modeling. The proposed model combines a Transformer backbone routing long-range information between all elements in the sequence, with a series of 3D convolutions extracting local features of the data. We train iDoTA on a dataset of 1700 beam dose distributions, using 11 clinical volumetric modulated arc therapy (VMAT) plans (from prostate, lung, and head and neck cancer patients with 194–354 beams per plan) to assess its accuracy and speed. Results: iDoTA predicts individual photon beams in ≈50 ms with a high gamma pass rate of (Formula presented.) (2 mm, 2%). Furthermore, estimating full VMAT dose distributions in 6–12 s, iDoTA achieves state-of-the-art performance with a (Formula presented.) (2 mm, 2%) pass rate and an average relative dose error of 0.75 ± 0.36%. Conclusions: Offering the millisecond speed prediction per beam angle needed in online and real-time adaptive treatments, iDoTA represents a new state of the art in data-driven photon dose calculation. The proposed model can massively speed-up current photon workflows, reducing calculation times from few minutes to just a few seconds. ...

Background: Fast dose calculation is critical for online and real-time adaptive therapy workflows. While modern physics-based dose algorithms must compromise accuracy to achieve low computation times, deep learning models can potentially perform dose prediction tasks with both high fidelity and speed. Purpose: We present a deep learning algorithm that, exploiting synergies between transformer and convolutional layers, accurately predicts broad photon beam dose distributions in few milliseconds. Methods: The proposed improved Dose Transformer Algorithm (iDoTA) maps arbitrary patient geometries and beam information (in the form of a 3D projected shape resulting from a simple ray tracing calculation) to their corresponding 3D dose distribution. Treating the 3D CT input and dose output volumes as a sequence of 2D slices along the direction of the photon beam, iDoTA solves the dose prediction task as sequence modeling. The proposed model combines a Transformer backbone routing long-range information between all elements in the sequence, with a series of 3D convolutions extracting local features of the data. We train iDoTA on a dataset of 1700 beam dose distributions, using 11 clinical volumetric modulated arc therapy (VMAT) plans (from prostate, lung, and head and neck cancer patients with 194–354 beams per plan) to assess its accuracy and speed. Results: iDoTA predicts individual photon beams in ≈50 ms with a high gamma pass rate of (Formula presented.) (2 mm, 2%). Furthermore, estimating full VMAT dose distributions in 6–12 s, iDoTA achieves state-of-the-art performance with a (Formula presented.) (2 mm, 2%) pass rate and an average relative dose error of 0.75 ± 0.36%. Conclusions: Offering the millisecond speed prediction per beam angle needed in online and real-time adaptive treatments, iDoTA represents a new state of the art in data-driven photon dose calculation. The proposed model can massively speed-up current photon workflows, reducing calculation times from few minutes to just a few seconds.

A probabilistic deep learning model of inter-fraction anatomical variations in radiotherapy

Journal article (2023) - Oscar Pastor-Serrano, Steven Habraken, Mischa Hoogeman, Danny Lathouwers, Dennis Schaart, Yusuke Nomura, Lei Xing, Zoltán Perkó

Objective. In radiotherapy, the internal movement of organs between treatment sessions causes errors in the final radiation dose delivery. To assess the need for adaptation, motion models can be used to simulate dominant motion patterns and assess anatomical robustness before delivery. Traditionally, such models are based on principal component analysis (PCA) and are either patient-specific (requiring several scans per patient) or population-based, applying the same set of deformations to all patients. We present a hybrid approach which, based on population data, allows to predict patient-specific inter-fraction variations for an individual patient. Approach. We propose a deep learning probabilistic framework that generates deformation vector fields warping a patient's planning computed tomography (CT) into possible patient-specific anatomies. This daily anatomy model (DAM) uses few random variables capturing groups of correlated movements. Given a new planning CT, DAM estimates the joint distribution over the variables, with each sample from the distribution corresponding to a different deformation. We train our model using dataset of 312 CT pairs with prostate, bladder, and rectum delineations from 38 prostate cancer patients. For 2 additional patients (22 CTs), we compute the contour overlap between real and generated images, and compare the sampled and ‘ground truth’ distributions of volume and center of mass changes. Results. With a DICE score of 0.86 ± 0.05 and a distance between prostate contours of 1.09 ± 0.93 mm, DAM matches and improves upon previously published PCA-based models, using as few as 8 latent variables. The overlap between distributions further indicates that DAM’s sampled movements match the range and frequency of clinically observed daily changes on repeat CTs. Significance. Conditioned only on planning CT values and organ contours of a new patient without any pre-processing, DAM can accurately deformations seen during following treatment sessions, enabling anatomically robust treatment planning and robustness evaluation against inter-fraction anatomical changes. ...

Objective. In radiotherapy, the internal movement of organs between treatment sessions causes errors in the final radiation dose delivery. To assess the need for adaptation, motion models can be used to simulate dominant motion patterns and assess anatomical robustness before delivery. Traditionally, such models are based on principal component analysis (PCA) and are either patient-specific (requiring several scans per patient) or population-based, applying the same set of deformations to all patients. We present a hybrid approach which, based on population data, allows to predict patient-specific inter-fraction variations for an individual patient. Approach. We propose a deep learning probabilistic framework that generates deformation vector fields warping a patient's planning computed tomography (CT) into possible patient-specific anatomies. This daily anatomy model (DAM) uses few random variables capturing groups of correlated movements. Given a new planning CT, DAM estimates the joint distribution over the variables, with each sample from the distribution corresponding to a different deformation. We train our model using dataset of 312 CT pairs with prostate, bladder, and rectum delineations from 38 prostate cancer patients. For 2 additional patients (22 CTs), we compute the contour overlap between real and generated images, and compare the sampled and ‘ground truth’ distributions of volume and center of mass changes. Results. With a DICE score of 0.86 ± 0.05 and a distance between prostate contours of 1.09 ± 0.93 mm, DAM matches and improves upon previously published PCA-based models, using as few as 8 latent variables. The overlap between distributions further indicates that DAM’s sampled movements match the range and frequency of clinically observed daily changes on repeat CTs. Significance. Conditioned only on planning CT values and organ contours of a new patient without any pre-processing, DAM can accurately deformations seen during following treatment sessions, enabling anatomically robust treatment planning and robustness evaluation against inter-fraction anatomical changes.

Learning image representations for content-based image retrieval of radiotherapy treatment plans

Journal article (2023) - Charles Huang, Varun Vasudevan, Oscar Pastor-Serrano, Md Tauhidul Islam, Yusuke Nomura, Piotr Dubrowski, Jen Yeu Wang, Joseph B. Schulz, Yong Yang, More authors...

Objective. In this work, we propose a content-based image retrieval (CBIR) method for retrieving dose distributions of previously planned patients based on anatomical similarity. Retrieved dose distributions from this method can be incorporated into automated treatment planning workflows in order to streamline the iterative planning process. As CBIR has not yet been applied to treatment planning, our work seeks to understand which current machine learning models are most viable in this context. Approach. Our proposed CBIR method trains a representation model that produces latent space embeddings of a patient’s anatomical information. The latent space embeddings of new patients are then compared against those of previous patients in a database for image retrieval of dose distributions. All source code for this project is available on github. Main results. The retrieval performance of various CBIR methods is evaluated on a dataset consisting of both publicly available image sets and clinical image sets from our institution. This study compares various encoding methods, ranging from simple autoencoders to more recent Siamese networks like SimSiam, and the best performance was observed for the multitask Siamese network. Significance. Our current results demonstrate that excellent image retrieval performance can be obtained through slight changes to previously developed Siamese networks. We hope to integrate CBIR into automated planning workflow in future works. ...

Millisecond speed deep learning based proton dose calculation with Monte Carlo accuracy

Journal article (2022) - Oscar Pastor-Serrano, Zoltán Perkó

Objective. Next generation online and real-time adaptive radiotherapy workflows require precise particle transport simulations in sub-second times, which is unfeasible with current analytical pencil beam algorithms (PBA) or Monte Carlo (MC) methods. We present a deep learning based millisecond speed dose calculation algorithm (DoTA) accurately predicting the dose deposited by mono-energetic proton pencil beams for arbitrary energies and patient geometries. Approach. Given the forward-scattering nature of protons, we frame 3D particle transport as modeling a sequence of 2D geometries in the beam's eye view. DoTA combines convolutional neural networks extracting spatial features (e.g. tissue and density contrasts) with a transformer self-attention backbone that routes information between the sequence of geometry slices and a vector representing the beam's energy, and is trained to predict low noise MC simulations of proton beamlets using 80 000 different head and neck, lung, and prostate geometries. Main results. Predicting beamlet doses in 5 ± 4.9 ms with a very high gamma pass rate of 99.37 ± 1.17% (1%, 3 mm) compared to the ground truth MC calculations, DoTA significantly improves upon analytical pencil beam algorithms both in precision and speed. Offering MC accuracy 100 times faster than PBAs for pencil beams, our model calculates full treatment plan doses in 10-15 s depending on the number of beamlets (800-2200 in our plans), achieving a 99.70 ± 0.14% (2%, 2 mm) gamma pass rate across 9 test patients. Significance. Outperforming all previous analytical pencil beam and deep learning based approaches, DoTA represents a new state of the art in data-driven dose calculation and can directly compete with the speed of even commercial GPU MC approaches. Providing the sub-second speed required for adaptive treatments, straightforward implementations could offer similar benefits to other steps of the radiotherapy workflow or other modalities such as helium or carbon treatments. ...

Objective. Next generation online and real-time adaptive radiotherapy workflows require precise particle transport simulations in sub-second times, which is unfeasible with current analytical pencil beam algorithms (PBA) or Monte Carlo (MC) methods. We present a deep learning based millisecond speed dose calculation algorithm (DoTA) accurately predicting the dose deposited by mono-energetic proton pencil beams for arbitrary energies and patient geometries. Approach. Given the forward-scattering nature of protons, we frame 3D particle transport as modeling a sequence of 2D geometries in the beam's eye view. DoTA combines convolutional neural networks extracting spatial features (e.g. tissue and density contrasts) with a transformer self-attention backbone that routes information between the sequence of geometry slices and a vector representing the beam's energy, and is trained to predict low noise MC simulations of proton beamlets using 80 000 different head and neck, lung, and prostate geometries. Main results. Predicting beamlet doses in 5 ± 4.9 ms with a very high gamma pass rate of 99.37 ± 1.17% (1%, 3 mm) compared to the ground truth MC calculations, DoTA significantly improves upon analytical pencil beam algorithms both in precision and speed. Offering MC accuracy 100 times faster than PBAs for pencil beams, our model calculates full treatment plan doses in 10-15 s depending on the number of beamlets (800-2200 in our plans), achieving a 99.70 ± 0.14% (2%, 2 mm) gamma pass rate across 9 test patients. Significance. Outperforming all previous analytical pencil beam and deep learning based approaches, DoTA represents a new state of the art in data-driven dose calculation and can directly compete with the speed of even commercial GPU MC approaches. Providing the sub-second speed required for adaptive treatments, straightforward implementations could offer similar benefits to other steps of the radiotherapy workflow or other modalities such as helium or carbon treatments.

A semi-supervised autoencoder framework for joint generation and classification of breathing

Journal article (2021) - Oscar Pastor-Serrano, Danny Lathouwers, Zoltán Perkó

Background and objective: One of the main problems with biomedical signals is the limited amount of patient-specific data and the significant amount of time needed to record the sufficient number of samples needed for diagnostic and treatment purposes. In this study, we present a framework to simultaneously generate and classify biomedical time series based on a modified Adversarial Autoencoder (AAE) algorithm and one-dimensional convolutions. Our work is based on breathing time series, with specific motivation to capture breathing motion during radiotherapy lung cancer treatments. Methods: First, we explore the potential in using the Variational Autoencoder (VAE) and AAE algorithms to model breathing signals from individual patients. We then extend the AAE algorithm to allow joint semi-supervised classification and generation of different types of signals within a single framework. To simplify the modeling task, we introduce a pre-processing and post-processing compressing algorithm that transforms the multi-dimensional time series into vectors containing time and position values, which are transformed back into time series through an additional neural network. Results: The resulting models are able to generate realistic and varied samples of breathing. By incorporating 4% and 12% of the labeled samples during training, our model outperforms other purely discriminative networks in classifying breathing baseline shift irregularities from a dataset completely different from the training set, achieving an average macro F1-score of 94.91% and 96.54%, respectively. Conclusion: To our knowledge, the presented framework is the first approach that unifies generation and classification within a single model for this type of biomedical data, enabling both computer aided diagnosis and augmentation of labeled samples within a single framework. ...

How should we model and evaluate breathing interplay effects in IMPT?

Journal article (2021) - Oscar Pastor-Serrano, Steven Habraken, Danny Lathouwers, Mischa Hoogeman, Dennis Schaart, Zoltán Perkó

Breathing interplay effects in Intensity Modulated Proton Therapy (IMPT) arise from the interaction between target motion and the scanning beam. Assessing the detrimental effect of interplay and the clinical robustness of several mitigation techniques requires statistical evaluation procedures that take into account the variability of breathing during dose delivery. In this study, we present such a statistical method to model intra-fraction respiratory motion based on breathing signals and assess clinical relevant aspects related to the practical evaluation of interplay in IMPT such as how to model irregular breathing, how small breathing changes affect the final dose distribution, and what is the statistical power (number of different scenarios) required for trustworthy quantification of interplay effects. First, two data-driven methodologies to generate artificial patient-specific breathing signals are compared: a simple sinusoidal model, and a precise probabilistic deep learning model generating very realistic samples of patient breathing. Second, we investigate the highly fluctuating relationship between interplay doses and breathing parameters, showing that small changes in breathing period result in large local variations in the dose. Our results indicate that using a limited number of samples to calculate interplay statistics introduces a bigger error than using simple sinusoidal models based on patient parameters or disregarding breathing hysteresis during the evaluation. We illustrate the power of the presented statistical method by analyzing interplay robustness of 4DCT and Internal Target Volume (ITV) treatment plans for a 8 lung cancer patients, showing that, unlike 4DCT plans, even 33 fraction ITV plans systematically fail to fulfill robustness requirements. ...