Ev

E.A. van der Toorn

info

Please Note

5 records found

Using machine learning methods to identify functional biomarkers in the human gut microbiome

Bachelor thesis (2023) - A.J.G. Sloof, Thomas Abeel, E.A. van der Toorn, D. Calderon Franco, T. Höllt
Colorectal cancer (CRC), one of the leading causes of mortality, is challenging to diagnose. By using metagenomic analysis with machine learning methods, this can be done in a non-invasive manner. In this research, a neural network has been trained on relative pathway abundance data, a way to measure the functional potential of a microbiome, in order to find biomarkers for colorectal cancer. The accuracy achieved by the neural network is 57%. The most important features used by the model are compared to established biomarkers in literature. Besides overlapping pathways, this research also found new potential biomarkers for CRC. ...

Using machine learning to analyse metagenomic data

Parkinson's disease (PD) is a neurodegenerative disorder characterized by motor function loss and potential mental and behavioral changes. The identification of biomarkers in the gut microbiota of PD patients can significantly aid in fast and accurate diagnosis. This study investigates the application of machine learning (ML) models, including Logistic Regression (LR), Random Forest (RF), and Support Vector Machines (SVM), to discover biomarkers in the gut metagenomic data of PD patients. The ML models were optimized using various feature selection techniques, and a comparative analysis of the most influential species in sample discrimination was conducted to verify potential PD-associated biomarkers.
The results demonstrate that all three ML models exhibit moderate performance, indicating their limited discriminatory power. However, the comparison of significant species across different classifiers demonstrates substantial overlap and indicates PD-associated species that align with existing literature findings. These outcomes provide promising evidence that LR, RF, and SVM classifiers can effectively identify biomarkers for PD. However, confounding analysis on a small subset of the dataset failed to identify meaningful PD-associated species. Therefore, caution is advised when interpreting the findings of ML model, considering factors such as classifier performance, dataset limitations, potential biases, influence of feature selection methods, and inherent model differences.
We validate the potential usefulness of ML approaches for biomarker discovery and highlight areas for further investigation into building a sufficiently accurate ML model for PD diagnosis. ...
Celiac disease is a genetic autoimmune disorder caused by a negative reaction to gluten associated with alterations in the gut microbiome. This study explored the potential of machine learning models and feature selection methods in identifying biomarkers for celiac disease using gut microbiome data. The performance of several machine learning models was evaluated, and the impact of different feature selection methods, including MRMR, ANOVA, and information gain, was examined. The findings revealed comparable performance among the models without feature selection. However, the choice of feature selection method had varying effects on model performance, with logistic regression and support vector machines being more sensitive than random forest and XGBoost models. Notably, several identified bacteria species, such as Bacteroides eggerthii, Parabacteroides johnsonii, Faecalibacterium prausnitzii, and Ruminococcus_D bicirculans, have been previously associated with celiac disease, reinforcing their potential as biomarkers. ...
Type 2 Diabetes is a very prevalent disease in current times and leads to significant adverse effects. Recently, there has been a growing interest in the association of the human gut microbiome with respect to chronic diseases like Type 2 Diabetes with the aim to identify biomarkers. In this study, we researched the effect of different machine learning and feature selection techniques to identify biomarkers for Type 2 Diabetes that can later be used for diagnosis and prediction. The main methods that we explored were Random Forests,Linear Regression, Support Vector Machines andXGBoost along with mRMR and CMIM as feature selection techniques. These methods were applied to data taken from Europe and China. We found that mRMR improved the performance of the Random Forest classifier compared to CMIM.Apart from finding biomarkers specific to one location, we found that Clostridiales, Clostridium, Roseburia and Lactobacillus could be of interestin the prediction of Type 2 Diabetes irrespective of location. This study verified biomarkers found in previous literature and evaluated several techniquesfor the prediction of the disease across different regions. ...

Can Machine Learning algorithms identify schizophrenia-related biomarkers within metagenomic data derived from the human gut microbiome?

Bachelor thesis (2023) - T.M. Bastow, Eric van der Toorn, David Calderon Franco, Thomas Abeel, Thomas Höllt
There is mounting evidence indicating a relation- ship between the gut microbiome composition and the development of mental diseases but the mech- anisms remain unclear. Shotgun sequenced data from 90 schizophrenic patients and 81 sex, age, weight, and location matched controls was used for three machine learning models: Logistic Re- gression, Random Forests, and XGBoost. The 20 most relevant species in the decision mak- ing of each classifier was retained and the over- lap between models recorded. There is a total 19 overlapping species between the models’ top 20 most relevant species, with 10 species over- lapping on all three models. Bifidobacterium bi- fidum, Akkermansia muciniphila, Eubacterium sir- aeum, Alistipes finegoldii, Intestinibacter bartlet- tii, Bifidobacterium pseudocatenulatum, and Strep- tococcus thermophilus are of particular interest as they are reported as enriched in schizophrenia sam- ples in existing literatures. Phoceicola vulgatus has been found to play a significant role in the classi- fiers decisions and is enriched in healthy samples in the literature. One species, Ruthenibacterium lactatiformans, and one co-abundant gene group, Eubacterium sp. CAG:180, consistently ranked as the most important features across all three classi- fiers, despite the absence of reporting in existing literature. This study could be expanded by using genus-level data. Further research should be done to validate the species mentioned above as potential biomarkers for schizophrenia. ...