Identifying biological markers in the gut microbiome associated with celiac disease using machine learning
P. Persianov (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Thomas Abeel – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
E.A. van der Toorn – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
David Calderon Franco – Mentor (TU Delft - BT/Environmental Biotechnology)
Thomas Hollt – Graduation committee member (TU Delft - Computer Graphics and Visualisation)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Celiac disease is a genetic autoimmune disorder caused by a negative reaction to gluten associated with alterations in the gut microbiome. This study explored the potential of machine learning models and feature selection methods in identifying biomarkers for celiac disease using gut microbiome data. The performance of several machine learning models was evaluated, and the impact of different feature selection methods, including MRMR, ANOVA, and information gain, was examined. The findings revealed comparable performance among the models without feature selection. However, the choice of feature selection method had varying effects on model performance, with logistic regression and support vector machines being more sensitive than random forest and XGBoost models. Notably, several identified bacteria species, such as Bacteroides eggerthii, Parabacteroides johnsonii, Faecalibacterium prausnitzii, and Ruminococcus_D bicirculans, have been previously associated with celiac disease, reinforcing their potential as biomarkers.