Understanding the mechanisms of aging can help us live longer and healthier lives. Epigenetic age predictors are machine learning models that use methylation levels at CpG sites to predict the biological age of the cell. Horvath’s linear clock uses 353 CpGs with a median absolute
...
Understanding the mechanisms of aging can help us live longer and healthier lives. Epigenetic age predictors are machine learning models that use methylation levels at CpG sites to predict the biological age of the cell. Horvath’s linear clock uses 353 CpGs with a median absolute error (MedAE) of 3.530, while the deep learning model AltumAge uses 20,318 CpGs to achieve a MedAE of 2.147. This study explores how to improve the accuracy of age predictors through model architecture selection, hyperparameter optimization, and feature selection. ElasticNet regression with recursive feature elimination achieved a MedAE of 2.820 using 341 CpGs, outperforming Horvath’s clock. The two models shared 95 CpG sites, and gene enrichment analysis revealed that several associated genes are involved in stem cell regulation. Feature importance and model interpretation were performed using SHAP analysis, which indicated that age prediction cannot be captured by a small subset of CpG sites. It was concluded that epigenetics has an influence on stem cells, which was found to be a biomarker of aging. Aging remains a complex process that deep learning models may capture better.