Representation Learning for High-Dimensional Single-Cell Genomics with Variational Autoencoders

Using Associations Between Latent Factors and SNPs to Discover new eQTLs

Bachelor Thesis (2026)
Author(s)

X.L.D. van der Ham (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

I.C. den Hond – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

K. Biharie – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

M.J.T. Reinders – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
25-06-2026
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
3
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Single-cell expression quantitative trait loci (eQTL) studies link genetic variants to
changes in gene expression in that cell. This allows us to study the effect of genetics on diseases per cell instead of aggregated, since effects can differ per cell type. Traditional SNP to gene expression linking on the single-cell level suffers from the multiple testing burden, due to the great amount of SNPs and genes. To address this, a deep learning framework was developed recently to compress gene expression into low-dimensional encodings and reconstruct the gene expression linearly from these encodings, enabling direct interpretation of the latent space. This model is called Latent Interaction Variational Inference (LIVI). Here, we determine whether the latent factors of this model can serve as a quantitative trait for Single Nucleotide Polymorphisms (SNPs) that associate with Rheumatoid Arthritis (RA) on a dataset with RA patients. RA is a chronic disease characterized by progressive damage of the joints. In this study, we found 617 out of 700 latent factors correlating to at least one SNP, using a linear mixed model. We also found that genes that are associated with RA in a Genome Wide Association Study have a higher loading for associated SNP-Latent factor pairs then for none associated one. We also identified genes affected by GWAS-identified risk SNPs for which the original GWAS did not identify a functionally associated gene. We conclude that the latent factors of the LIVI model can be used as a quantitative trait for SNPs, and used these latent factors to discover trans-eQTLs.

Files

BepPaper-FINAL.pdf
(pdf | 0.63 Mb)
Unspecified