Improving and Interpreting Epigenetic Age Predictors
A Machine Learning Approach to Improving Epigenetic Age Predictors and Understanding How DNA Methylation Affects Aging
E.W.J. Langens (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Bram Pronk – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
I.C. den Hond – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
Gerard A. Bouland – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
Marcel J.T. Reinders – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
Kaitai Liang – Graduation committee member (TU Delft - Cyber Security)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Understanding the mechanisms of aging can help us live longer and healthier lives. Epigenetic age predictors are machine learning models that use methylation levels at CpG sites to predict the biological age of the cell. Horvath’s linear clock uses 353 CpGs with a median absolute error (MedAE) of 3.530, while the deep learning model AltumAge uses 20,318 CpGs to achieve a MedAE of 2.147. This study explores how to improve the accuracy of age predictors through model architecture selection, hyperparameter optimization, and feature selection. ElasticNet regression with recursive feature elimination achieved a MedAE of 2.820 using 341 CpGs, outperforming Horvath’s clock. The two models shared 95 CpG sites, and gene enrichment analysis revealed that several associated genes are involved in stem cell regulation. Feature importance and model interpretation were performed using SHAP analysis, which indicated that age prediction cannot be captured by a small subset of CpG sites. It was concluded that epigenetics has an influence on stem cells, which was found to be a biomarker of aging. Aging remains a complex process that deep learning models may capture better.