Age-Independent Prediction of Allergy DNA Methylation using Graph Learning
A.C.M. Vletter (TU Delft - Mechanical Engineering)
Micah Prendergast – Mentor (TU Delft - Human-Robot Interaction)
Holger Caesar – Graduation committee member (TU Delft - Intelligent Vehicles)
Merlijn van Breugel – Mentor (Ditto Care)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Differential DNA methylation patterns can serve as biomarkers for allergic diseases such as pediatric asthma and rhinitis, but age-dependent variability in epigenetic profiles undermines the reliability of predictive models. This thesis addresses that challenge by introducing a graph-based deep learning approach for \textbf{age-independent} allergy prediction from DNA methylation data. Each subject’s DNA methylation profile is represented as an individualized graph constructed via an extended Weighted Gene Co-expression Network Analysis (WGCNA) that captures global co-methylation structure and subject-specific patterns, thus balancing population-level relationships with individual epigenetic heterogeneity. Edges between CpG sites are assigned weights using a Gaussian kernel on methylation values, ensuring the graph reflects personalized similarity while maintaining biologically meaningful connections. A Graph Neural Network (GNN) with an Edge Convolution (EdgeConv) architecture is then trained on these subject-specific graphs to predict allergy outcomes. We evaluated this framework on DNA methylation data from three harmonized pediatric cohorts (PIAMA, MAKI, COPSAC) processed with the MEFFIL pipeline for cross-cohort normalization and quality control. An Epigenome-Wide Association Study (EWAS) identified key CpG features associated with asthma, rhinitis and IgE, which were used to guide feature selection for model training. Our graph-based model outperformed conventional methods like ElasticNet and XGBoost in certain cohorts and maintained robust predictive accuracy between the ages of 6 and 16, demonstrating a certain resilience to age-related methylation differences. Furthermore, we applied gradient-based saliency analysis to the trained GNN to highlight influential methylation features, providing interpretability and revealing plausible epigenetic markers of allergy. The proposed pipeline is scalable and interpretable, and its ability to deliver reliable, age-invariant risk predictions from early-life epigenetic data underscores its potential clinical utility for early allergy diagnostics in children.