Identifying biological markers in the gut microbiome associated with celiac disease using machine learning

More Info
expand_more

Abstract

Celiac disease is a genetic autoimmune disorder caused by a negative reaction to gluten associated with alterations in the gut microbiome. This study explored the potential of machine learning models and feature selection methods in identifying biomarkers for celiac disease using gut microbiome data. The performance of several machine learning models was evaluated, and the impact of different feature selection methods, including MRMR, ANOVA, and information gain, was examined. The findings revealed comparable performance among the models without feature selection. However, the choice of feature selection method had varying effects on model performance, with logistic regression and support vector machines being more sensitive than random forest and XGBoost models. Notably, several identified bacteria species, such as Bacteroides eggerthii, Parabacteroides johnsonii, Faecalibacterium prausnitzii, and Ruminococcus_D bicirculans, have been previously associated with celiac disease, reinforcing their potential as biomarkers.

Files