Predicted meta-omics

A potential solution to multi-omics data scarcity in microbiome studies

Journal Article (2026)
Author(s)

Bianca Maria Cosma (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Stephanie Pillay (Radboud University Medical Center)

David Calderón-Franco (Hologenomix Life Sciences)

Thomas Abeel (TU Delft - Electrical Engineering, Mathematics and Computer Science, Broad Institute of MIT and Harvard)

Research Group
Pattern Recognition and Bioinformatics
DOI related publication
https://doi.org/10.1371/journal. pone.0345919 Final published version
More Info
expand_more
Publication Year
2026
Language
English
Research Group
Pattern Recognition and Bioinformatics
Journal title
PLoS ONE
Issue number
4
Volume number
21
Article number
e0345919
Downloads counter
23
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Imbalances in the gut microbiome have been linked to conditions such as inflammatory bowel disease, diabetes, and cancer. While metagenomics and amplicon sequencing are commonly used to study the microbiome, they do not capture all layers of microbial functions. Other meta-omics data can provide more insights, but these are more costly and laborious to procure. The growing availability of paired meta-omics data offers an opportunity to develop machine learning models that can infer connections between metagenomics data and other forms of meta-omics data, enabling the prediction of these other forms of meta-omics data from metagenomics. We evaluated several machine learning models for predicting meta-omics features from various meta-omics inputs. Simpler architectures such as elastic net regression and random forests generated reliable predictions of transcript and metabolite abundances, with correlations of up to 0.77 and 0.74, respectively, but predicting protein profiles was more challenging. We also identified a core set of well-predicted features for each meta-omics output type, and showed that multi-output regression neural networks performed similarly when trained using fewer output features. Lastly, our experiments demonstrated that predicted features can be used for the downstream task of inflammatory bowel disease classification, with performance comparable to that of experimental data.