Benchmarking VAE latent features in downstream tasks for cancer related predictions

More Info
expand_more

Abstract

Using RNA sequence data for predicting patient properties is fairly common by now. In this paper, Variational Auto-Encoders (VAEs) are used to assist in this process. VAEs are a type of neural network seeking to encode data into a smaller dimension called latent space. These latent features are then used to do downstream task analysis such as cancer types, survival time and cancer stages, with the help of a MLP classifier. Furthermore, the training process itself is also analyzed with the usage of UMaps. The purpose of this paper is to compare different VAE models on their effectiveness in providing training data used for the predictions. The predictions mostly consist of guessing when using any of the latent spaces, constructed by the VAE models, as input data for the MLP classifier. The NoVAE model is the only model with slightly better performance when it comes to mean accuracy and standard deviation. The guessing issue is further analyzed with the help of UMaps. The VAEs are able to classify the input data during the training process, but when faced with new data, this end up not being the case. Both the learning rate and β term yield interesting results regarding the modification of the input data and variational property respectively. A lower learning rate leads to better classification, but this is due it deviation less from the original input data. When using a small β term with the β-VAE, the output is similar to that of the VanillaVAE. Meaning the VanillaVAE does not perform better than a regular autoencoder.