Sub-second speed 4D-CT image registration using unsupervised deep learning

More Info
expand_more

Abstract

To evaluate the effect of interplay due to breathing of the patient during proton treatment of lung tumors Interplay dose calculation techniques have been proposed in literature. The proposed method requires the deformation vector field (DVF) to register dose distributions of different phases in the breathing cycle to a reference phase. The DVF is obtained by registering 4DCT lung scans between the phases. Current methods of image registration are too slow to make the interplay dose calculation techniques clinically feasible.

Advances in deep learning have allowed for models that predict the DVF in orders of magnitude quicker than traditional methods. In this research, two model architectures, previously applied for registration of brain MRI images, will be evaluated to predict the DVF between scans at different phases of a 4DCT lung scan. The quality of the registration is evaluated based on the mean absolute error between the images and contour metrics of organs including the Dice score, Hausdorff distance and the mean surface distance. In addition, the amount of grid folding was evaluated based on the number of voxels with a negative Jacobean determinant.

The first model architecture, VoxelMorph, is an unsupervised model with an U-net architecture. Two hyperparameters were varied: the maximum size of the DVF limited by a HardTanh, and secondly the weight of the loss function for the divergence of the DVF during training. The model performed poorly in predicting the DVF, the values of the DVF were too small. Varying the hyperparameter seems to have no significant impact on the prediction quality of the model. Limiting the maximum of the DVF prevents the registration of large deformations, which is not favorable.

The second model architecture has a multi-resolution approach. The images are downsampled to 1/2 and 1/4 the resolution. Multiple sub-network predict a DVF at each of the resolutions in a coarse to fine order. Each of the networks consisted of a feature encoder, residual blocks and a feature decoder. By upsampling and combining the multiple DVFs, the final DVF is obtained. Hyperparameter search is performed: The number of residuals blocks and their filters were varied. At first only for the coarses network, and later for all the networks. Lastly, an additional resolution was added to the model. The model was capable of predicting good-quality DVFs. Only varying the number of residual blocks and their filters for all resolutions resulted in a significant difference in the quality of the prediction.

Predictions are performed in 260±4 ms and 24±4 ms for the first and second architectures respectively. Which is faster than other deep learning methods found in literature, and significantly faster compared to traditional registration methods