Host- Microbiome Omics Integration for Cancer Analysis and Diagnostics

Investigating the added value of integrating microbial and host omics information for cancer diagnostics using prediction models

Master Thesis (2023)
Author(s)

G. d' Abreu de Paulo (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

T.E.P.M.F. Abeel – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

A. Lukina – Graduation committee member (TU Delft - Algorithmics)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Gedeon d' Abreu de Paulo
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Gedeon d' Abreu de Paulo
Graduation Date
18-04-2023
Awarding Institution
Delft University of Technology
Programme
Computer Science | Artificial Intelligence
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Cancer is one of the leading causes of death in the world. While there have been many studies investigating the development and progression of cancer in human tissues using host omics data or microbial data, there is a lack of research combining both types of data, even though both modalities have been shown to affect cancer morphology and aetiology. Studies which do combine these modalities often use simple methods or do not consider the relation between the two modalities and disease phenotypes. Such an integrated approach could offer additional insights and lead to the discovery of new disease biomarkers and better treatment strategies and therapies.

In this paper, we investigated whether such a holo-genomic approach offers additional information compared to using the modalities separately, by comparing the performances of prediction models built using the individual and integrated modalities for various prediction endpoints. To do this, we used TCGA gene expression data for the host omics modality and bacterial genus abundance data from the TCGA-mined Cancer Microbiome Atlas (TCMA) for the microbiome modality.

We found no improvement when integrating host gene expression with microbial abundance information compared to using the gene expression data individually, and the microbial data provided the least amount of diagnostic information. This is likely due to the information density of gene expression data, high variation of the microbiome, and the quantity, specificity and validation of the TCMA data. These results suggest that the holo-omics approach might not provide additional utility in certain contexts, that additional considerations have to be made when choosing microbial and host omic datasets for holo-omic integration, and provide an insight into the usability of the TCMA data set.

Files

MSc_Thesis_Gedeon.pdf
(pdf | 1.92 Mb)
License info not available