An in-depth comparison of linear and non-linear joint embedding methods for bulk and single-cell multi-omics

None, None; None, None; None, None; None, None

An in-depth comparison of linear and non-linear joint embedding methods for bulk and single-cell multi-omics

Review (2024)

Author(s)

S. Makrodimitris (TU Delft - Pattern Recognition and Bioinformatics, Erasmus MC)

I.B. Pronk (TU Delft - Pattern Recognition and Bioinformatics)

T.R.M. Abdelaal (TU Delft - Pattern Recognition and Bioinformatics, Leiden University Medical Center)

M.J.T. Reinders (Leiden University Medical Center, TU Delft - Pattern Recognition and Bioinformatics)

Research Group

Pattern Recognition and Bioinformatics

DOI related publication

https://doi.org/10.1093/bib/bbad416

Neural networks Dimensionality reduction Joint embedding Multi-omics

To reference this document use:

https://resolver.tudelft.nl/uuid:15f8e22d-492b-4922-96fb-6cb7f9dac406

More Info

expand_more

Publication Year

2024

Language

English

Research Group

Pattern Recognition and Bioinformatics

Issue number

1

Volume number

25

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Multi-omic analyses are necessary to understand the complex biological processes taking place at the tissue and cell level, but also to make reliable predictions about, for example, disease outcome. Several linear methods exist that create a joint embedding using paired information per sample, but recently there has been a rise in the popularity of neural architectures that embed paired -omics into the same non-linear manifold. This work describes a head-to-head comparison of linear and non-linear joint embedding methods using both bulk and single-cell multi-modal datasets. We found that non-linear methods have a clear advantage with respect to linear ones for missing modality imputation. Performance comparisons in the downstream tasks of survival analysis for bulk tumor data and cell type classification for single-cell data lead to the following insights: First, concatenating the principal components of each modality is a competitive baseline and hard to beat if all modalities are available at test time. However, if we only have one modality available at test time, training a predictive model on the joint space of that modality can lead to performance improvements with respect to just using the unimodal principal components. Second, -omic profiles imputed by neural joint embedding methods are realistic enough to be used by a classifier trained on real data with limited performance drops. Taken together, our comparisons give hints to which joint embedding to use for which downstream task. Overall, product-of-experts performed well in most tasks and was reasonably fast, while early integration (concatenation) of modalities did quite poorly.

Files

Bbad416.pdf

(pdf | 1.45 Mb)