Machine learning in Alzheimer’s disease genetics
Marc Hulsman (TU Delft - Pattern Recognition and Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam UMC)
Itziar de Rojas (TU Delft - Pattern Recognition and Bioinformatics, National Institute of Health Carlos III, Universitat Internacional de Catalunya)
Sven van der Lee (Amsterdam UMC, Vrije Universiteit Amsterdam, TU Delft - Pattern Recognition and Bioinformatics)
Henne Holstege (TU Delft - Intelligent Systems, Vrije Universiteit Amsterdam, Amsterdam UMC)
Jeroen van Rooij (Erasmus MC)
Jasper Van Dongen (Universiteit Antwerpen, Institute Born - Bunge, VIB)
Niccolo Tesí (Vrije Universiteit Amsterdam, TU Delft - Pattern Recognition and Bioinformatics, Amsterdam UMC)
Marcel J.T. Reinders (TU Delft - Pattern Recognition and Bioinformatics)
Marc Hulsman (TU Delft - Pattern Recognition and Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam UMC)
Georgios Hadjigeorgiou (University of Cyprus Medical School)
Catarina B. Ferreira (Universidade de Lisboa)
Sven van der Lee (Amsterdam UMC, Vrije Universiteit Amsterdam, TU Delft - Pattern Recognition and Bioinformatics)
Iris Jansen (Amsterdam UMC, Vrije Universiteit Amsterdam)
Gennady Roshchupkin (Erasmus MC)
undefined More Authors (External organisation)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Traditional statistical approaches have advanced our understanding of the genetics of complex diseases, yet are limited to linear additive models. Here we applied machine learning (ML) to genome-wide data from 41,686 individuals in the largest European consortium on Alzheimer’s disease (AD) to investigate the effectiveness of various ML algorithms in replicating known findings, discovering novel loci, and predicting individuals at risk. We utilised Gradient Boosting Machines (GBMs), biological pathway-informed Neural Networks (NNs), and Model-based Multifactor Dimensionality Reduction (MB-MDR) models. ML approaches successfully captured all genome-wide significant genetic variants identified in the training set and 22% of associations from larger meta-analyses. They highlight 6 novel loci which replicate in an external dataset, including variants which map to ARHGAP25, LY6H, COG7, SOD1 and ZNF597. They further identify novel association in AP4E1, refining the genetic landscape of the known SPPL2A locus. Our results demonstrate that machine learning methods can achieve predictive performance comparable to classical approaches in genetic epidemiology and have the potential to uncover novel loci that remain undetected by traditional GWAS. These insights provide a complementary avenue for advancing the understanding of AD genetics.