Finding biological markers for Parkinson's disease

Using machine learning to analyse metagenomic data

Bachelor Thesis (2023)
Author(s)

M.L. Koning (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

E.A. van der Toorn – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

D. Calderon Franco – Mentor (TU Delft - BT/Environmental Biotechnology)

Thomas Abeel – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

T. Höllt – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Marilotte Koning
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Marilotte Koning
Graduation Date
29-06-2023
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Parkinson's disease (PD) is a neurodegenerative disorder characterized by motor function loss and potential mental and behavioral changes. The identification of biomarkers in the gut microbiota of PD patients can significantly aid in fast and accurate diagnosis. This study investigates the application of machine learning (ML) models, including Logistic Regression (LR), Random Forest (RF), and Support Vector Machines (SVM), to discover biomarkers in the gut metagenomic data of PD patients. The ML models were optimized using various feature selection techniques, and a comparative analysis of the most influential species in sample discrimination was conducted to verify potential PD-associated biomarkers.
The results demonstrate that all three ML models exhibit moderate performance, indicating their limited discriminatory power. However, the comparison of significant species across different classifiers demonstrates substantial overlap and indicates PD-associated species that align with existing literature findings. These outcomes provide promising evidence that LR, RF, and SVM classifiers can effectively identify biomarkers for PD. However, confounding analysis on a small subset of the dataset failed to identify meaningful PD-associated species. Therefore, caution is advised when interpreting the findings of ML model, considering factors such as classifier performance, dataset limitations, potential biases, influence of feature selection methods, and inherent model differences.
We validate the potential usefulness of ML approaches for biomarker discovery and highlight areas for further investigation into building a sufficiently accurate ML model for PD diagnosis.

Files

License info not available