Cross-cohort generalizability of deep and conventional machine learning for MRI-based diagnosis and prediction of Alzheimer's disease

More Info
expand_more

Abstract

This work validates the generalizability of MRI-based classification of Alzheimer's disease (AD) patients and controls (CN) to an external data set and to the task of prediction of conversion to AD in individuals with mild cognitive impairment (MCI). We used a conventional support vector machine (SVM) and a deep convolutional neural network (CNN) approach based on structural MRI scans that underwent either minimal pre-processing or more extensive pre-processing into modulated gray matter (GM) maps. Classifiers were optimized and evaluated using cross-validation in the Alzheimer's Disease Neuroimaging Initiative (ADNI; 334 AD, 520 CN). Trained classifiers were subsequently applied to predict conversion to AD in ADNI MCI patients (231 converters, 628 non-converters) and in the independent Health-RI Parelsnoer Neurodegenerative Diseases Biobank data set. From this multi-center study representing a tertiary memory clinic population, we included 199 AD patients, 139 participants with subjective cognitive decline, 48 MCI patients converting to dementia, and 91 MCI patients who did not convert to dementia. AD-CN classification based on modulated GM maps resulted in a similar area-under-the-curve (AUC) for SVM (0.940; 95%CI: 0.924–0.955) and CNN (0.933; 95%CI: 0.918–0.948). Application to conversion prediction in MCI yielded significantly higher performance for SVM (AUC = 0.756; 95%CI: 0.720-0.788) than for CNN (AUC = 0.742; 95%CI: 0.709-0.776) (p<0.01 for McNemar's test). In external validation, performance was slightly decreased. For AD-CN, it again gave similar AUCs for SVM (0.896; 95%CI: 0.855–0.932) and CNN (0.876; 95%CI: 0.836–0.913). For prediction in MCI, performances decreased for both SVM (AUC = 0.665; 95%CI: 0.576-0.760) and CNN (AUC = 0.702; 95%CI: 0.624-0.786). Both with SVM and CNN, classification based on modulated GM maps significantly outperformed classification based on minimally processed images (p=0.01). Deep and conventional classifiers performed equally well for AD classification and their performance decreased only slightly when applied to the external cohort. We expect that this work on external validation contributes towards translation of machine learning to clinical practice.