Assessing the Capability of Multimodal Variational Auto-Encoders in Combining Information From Biological Layers in Cancer Cells

More Info
expand_more

Abstract

Personalized treatment methods for a complex disease such as cancer benefit from using multiple data modalities from a patient's cancer cells. Multiple modalities allow for analysis of dependencies between complex biological processes and downstream tasks, such as drug response and/or expected survival rate. To this end, it is important to gain an understanding of the relationships between modalities in tumor cells. Multimodal Variational Auto-Encoders (MVAEs) are a combination of generative models trained on different sets of data modalities. In this research, the ability of MVAEs to capture common information between different data views from the same tumor cells is assessed. MVAE models discussed here are a Mixture-of-Experts (MoE) and a Product-of-Experts (PoE) approach to combining the generative model posterior distributions into a single common latent space. The performance assessment is done by: i) comparing the loss of information when reconstructing the training data to MOFA+, a linear method for combining multimodal data, and ii) measuring if one modality of a tumor cell can generate another modality, based on characteristics of the latent space learned by the MVAE. Biological data modalities considered are RNA-seq, gene-level copy number and DNA methylation (DNAme), gathered by The Cancer Genome Atlas. It is found that PoE reconstructs data from all data types with a higher accuracy compared to MoE and MOFA+. The mean squared error of PoE's average reconstruction loss is about a quarter of MOFA+'s, and less than a seventh of the MoE's average reconstruction loss. In terms of predicting modalities from other modalities, the PoE again outperforms MoE on all cross-modal predictions. Additionally, it can be concluded that both models have higher losses in their prediction of DNAme from other modalities, indicating a lesser correlation between this data type and the others.