Learning to Learn from Microbiome Data
Benchmarking Meta-Learning for Disease Classification on Microbiome Abundance Data
S. Ramezani (TU Delft - Electrical Engineering, Mathematics and Computer Science)
T.E.P.M.F. Abeel – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
Chengyao Peng – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
B.M. Cosma – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
Jasmijn A. Baaijens – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
Christoph Lofi – Graduation committee member (TU Delft - Web Information Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
The human gut microbiome has emerged as a key player in health and disease, yet machine learning on microbiome data remains challenging due to its high dimensionality, sparsity, compositionality, and inter-study heterogeneity. Although classical and deep learning methods have demonstrated promise, they often require extensive labeled data, which is rarely available in microbiome research. In this thesis, we investigate whether meta-learning can address these challenges by enabling better generalization from small, heterogeneous microbiome datasets. Specifically, we benchmark Prototypical networks (Protonets), a metric-based, few-shot meta-learning algorithm, against strong classical baselines (Random Forests, XGBoost, and Multi-layer Perceptrons) for disease classification tasks across a selected number of gut microbiome studies. We introduce a unified benchmarking pipeline that standardizes preprocessing, dimensionality reduction, task construction, and evaluation across studies. A leave-one-study-out cross-validation strategy simulates realistic deployment scenarios where only a few labeled samples are available from a new cohort. Our experiments explore the impact of support set size and dimensionality reduction via principal component analysis. Results show that although Protonets offer a conceptually appealing approach for few-shot learning, they consistently underperform compared to Random Forests in classification accuracy. Statistical analyses confirm the significance of this performance gap, and embedding visualizations reveal limited class separation in the learned feature space. These findings suggest that, under the evaluated conditions, classical models like Random Forests remain the more robust choice for microbiome classification in low-data regimes. By offering a rigorous and reproducible evaluation, this work lays the foundation for further exploration of meta-learning in microbiome research and highlights both the potential and current limitations of learning to learn in this complex domain.