Learning to Learn from Microbiome Data

None, None

Learning to Learn from Microbiome Data

Benchmarking Meta-Learning for Disease Classification on Microbiome Abundance Data

Master Thesis (2025)

Author(s)

S. Ramezani (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Thomas Abeel – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

C. Peng – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

B.M. Cosma – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

J.A. Baaijens – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

C. Lofi – Graduation committee member (TU Delft - Web Information Systems)

Gut microbiome Microbiome Meta-learning Microbiome classification Prototypical networks

To reference this document use

https://resolver.tudelft.nl/uuid:62d934f5-2716-4acc-95d4-91b55af17c3f

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

15-07-2025

Awarding Institution

Programme

Computer Science

Downloads counter

86

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The human gut microbiome has emerged as a key player in health and disease, yet machine learning on microbiome data remains challenging due to its high dimensionality, sparsity, compositionality, and inter-study heterogeneity. Although classical and deep learning methods have demonstrated promise, they often require extensive labeled data, which is rarely available in microbiome research. In this thesis, we investigate whether meta-learning can address these challenges by enabling better generalization from small, heterogeneous microbiome datasets. Specifically, we benchmark Prototypical networks (Protonets), a metric-based, few-shot meta-learning algorithm, against strong classical baselines (Random Forests, XGBoost, and Multi-layer Perceptrons) for disease classification tasks across a selected number of gut microbiome studies. We introduce a unified benchmarking pipeline that standardizes preprocessing, dimensionality reduction, task construction, and evaluation across studies. A leave-one-study-out cross-validation strategy simulates realistic deployment scenarios where only a few labeled samples are available from a new cohort. Our experiments explore the impact of support set size and dimensionality reduction via principal component analysis. Results show that although Protonets offer a conceptually appealing approach for few-shot learning, they consistently underperform compared to Random Forests in classification accuracy. Statistical analyses confirm the significance of this performance gap, and embedding visualizations reveal limited class separation in the learned feature space. These findings suggest that, under the evaluated conditions, classical models like Random Forests remain the more robust choice for microbiome classification in low-data regimes. By offering a rigorous and reproducible evaluation, this work lays the foundation for further exploration of meta-learning in microbiome research and highlights both the potential and current limitations of learning to learn in this complex domain.

Files

Learning_to_learn_from_microbi... (pdf)

(pdf | 16.8 Mb)

License info not available