Benchmarking VAE latent features in downstream tasks for cancer related predictions

None, None

Benchmarking VAE latent features in downstream tasks for cancer related predictions

Bachelor Thesis (2021)

Author(s)

B. van Groeningen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Mostafa Eltager – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Tamim R. Abdelaal – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Mohammed Charrout – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

S. Makrodimitris – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Marcel J.T. Reinders – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

E. Isufi – Coach (TU Delft - Multimedia Computing)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Machine learning VAE Cancer Prediction RNA

To reference this document use:

https://resolver.tudelft.nl/uuid:6aaeb628-2b7d-43fc-bacd-68c6b875701d

More Info

expand_more

Publication Year

2021

Language

English

Copyright

Graduation Date

02-07-2021

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Using RNA sequence data for predicting patient properties is fairly common by now. In this paper, Variational Auto-Encoders (VAEs) are used to assist in this process. VAEs are a type of neural network seeking to encode data into a smaller dimension called latent space. These latent features are then used to do downstream task analysis such as cancer types, survival time and cancer stages, with the help of a MLP classifier. Furthermore, the training process itself is also analyzed with the usage of UMaps. The purpose of this paper is to compare different VAE models on their effectiveness in providing training data used for the predictions. The predictions mostly consist of guessing when using any of the latent spaces, constructed by the VAE models, as input data for the MLP classifier. The NoVAE model is the only model with slightly better performance when it comes to mean accuracy and standard deviation. The guessing issue is further analyzed with the help of UMaps. The VAEs are able to classify the input data during the training process, but when faced with new data, this end up not being the case. Both the learning rate and β term yield interesting results regarding the modification of the input data and variational property respectively. A lower learning rate leads to better classification, but this is due it deviation less from the original input data. When using a small β term with the β-VAE, the output is similar to that of the VanillaVAE. Meaning the VanillaVAE does not perform better than a regular autoencoder.

Files

Research_Paper_BvG.pdf

(pdf | 0.755 Mb)

License info not available