Benchmarking VAE latent features in downstream tasks for cancer related predictions

Bachelor Thesis (2021)
Author(s)

B. van Groeningen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Mostafa Eltager – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Tamim R. Abdelaal – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Mohammed Charrout – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

S. Makrodimitris – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Marcel J.T. Reinders – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

E. Isufi – Coach (TU Delft - Multimedia Computing)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2021 Boris van Groeningen
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Boris van Groeningen
Graduation Date
02-07-2021
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Using RNA sequence data for predicting patient properties is fairly common by now. In this paper, Variational Auto-Encoders (VAEs) are used to assist in this process. VAEs are a type of neural network seeking to encode data into a smaller dimension called latent space. These latent features are then used to do downstream task analysis such as cancer types, survival time and cancer stages, with the help of a MLP classifier. Furthermore, the training process itself is also analyzed with the usage of UMaps. The purpose of this paper is to compare different VAE models on their effectiveness in providing training data used for the predictions. The predictions mostly consist of guessing when using any of the latent spaces, constructed by the VAE models, as input data for the MLP classifier. The NoVAE model is the only model with slightly better performance when it comes to mean accuracy and standard deviation. The guessing issue is further analyzed with the help of UMaps. The VAEs are able to classify the input data during the training process, but when faced with new data, this end up not being the case. Both the learning rate and β term yield interesting results regarding the modification of the input data and variational property respectively. A lower learning rate leads to better classification, but this is due it deviation less from the original input data. When using a small β term with the β-VAE, the output is similar to that of the VanillaVAE. Meaning the VanillaVAE does not perform better than a regular autoencoder.

Files

Research_Paper_BvG.pdf
(pdf | 0.755 Mb)
License info not available