Speeding up proton therapy optimization algorithms using GPU-acceleration

More Info


Proton irradiation therapy is a powerful form of cancer treatment, promising better dose conformity as compared to conventional radiotherapy. Due to the complex scattering properties of protons, the optimization process that is needed to accurately target the cancer cells whilst causing minimal damage to
the surrounding healthy tissue and minimizing detrimental effects, is very time-consuming. This means that the CT scan it is based on has lost part of its accuracy in describing the to be irradiated tissue. To account for this, the surrounding tissue is irradiated more to ensure effective treatment, reducing dose conformity and thus increasing the probability of detrimental side effects.
To solve this problem, a new proton therapy method concept, called Online Adaptive Proton Therapy, or OAPT, calls for a CT scan to be taken about 30 seconds before the treatment, allowing the planned treatment to be adapted to any anatomical changes that have occurred since the previous scan where the original treatment planning was based on. The computations required for this adaptation, and the required quality assurance, currently still take significantly longer than 30 seconds on current computational hardware, making the concept not yet viable in its current form. However, using GPU-acceleration, parts of the algorithm that is used for this can be significantly sped up to reduce overall computational time, potentially making the concept viable for real world application. In this thesis, the research question ”How can GPU-offloading decrease computation time for proton therapy dose calculations?” is answered by accelerating two model algorithms representative of two time-consuming steps in the proton therapy optimization process and analyzing the performance, on
an NVIDIA V100S GPU, of the accelerated code. Furthermore, a model is postulated and validated to characterize and predict the performance of a GPU accelerated algorithm. Using OpenACC, both algorithms achieved speedups between 30x and 440x excluding data transfer time, and between 0.88x and 40x including data transfer time, with both values depending on the problem size, with larger problems yielding larger speedups. Further research is needed on the validity of the derived model on different hardware and for different algorithms. Furthermore, additional research on the effect of implementing these accelerated algorithms on the total computation time of the real world algorithm is advised.