When Weak Becomes Strong

Robust Quantification of White Matter Hyperintensities on Brain MRIs

More Info


In clinical practice, as a first approximation, the severity of an abnormality on an image is often determined by measuring its volume. Researchers often first segment this abnormality with a neural network trained by voxel-wise labels and thereafter extract the volume. Instead of this indirect two steps approach, we propose to train neural networks directly using the volumes as image-level label and predict the volume directly. Using image-level labels to train automatic abnormality prediction could decrease the labeling burden for clinical experts, which is both expensive and time consuming. In this report, a neural network that consisted of a segmentation part and an appended regression part was compared with the indirect segmentation approach. It was investigated if networks trained with image-level labels have the same performance of image-level prediction as networks trained with voxel-wise labels. The neural networks were trained on a large local dataset to quantify white matter hyperintensity (WMH) burden from brain MRI, and their performance was evaluated on a held-out test set. Furthermore, generalization properties were compared by applying the trained networks on four independent public datasets. The networks trained with image-level labels achieved volume quantification that was slightly better than their counterpart on the held-out test set. The attention maps of these networks showed that the networks were able to focus on the surroundings of the WMH, and hence learned meaningful image features. Nevertheless, the attention maps were not suitable to achieve a compatible segmentation. In terms of generalization towards external datasets, the advantage of weak labels for volume quantification did not hold as there was no significant difference between the performance of the label types. The results suggest that neural networks optimized with image-level labels were able to directly predict WMH volume as well as neural networks trained with voxel-wise labels. Subsequently, we also studied networks that were optimized on both image-level and voxel-wise labels. Those networks reached a lower performance, which suggested that the tasks and their image features learned were not similar enough.