Id

I.C. den Hond

info

Please Note

9 records found

Prediction with a classifier on top of the LIVI model

Large datasets can today be created with single-cell RNA sequencing (scRNA-seq), allowing researchers to measure RNA expression per cell. In 2026, a new model, Latent Interaction Variational Inference (LIVI), was proposed to analyze these data. LIVI is novel in capturing both cell- and donor-specific variation in the latent space with a Variational Autoencoder (VAE). The research by Vagiaki et al. was primarily focused on discovering expression quantitative trait loci (eQTLs). It is interesting to see how well the cell and donor latent space captures other characteristics, such as treatment success/failure. This research investigates whether the latent spaces of the LIVI model capture major cell types, sub-cell types, and treatment response. A simple classifier (MLP/SVM/Random Forest) was added on top of the latent spaces to evaluate this. Major cell types are clearly distinguishable from the cell latent space C, sub-cell types with a relatively larger class size can be well distinguished in the cell latent space C, and treatment response is partially captured in the DxC space, but not fully separable by a simple classifier. SVM and MLP outperform Random Forest in classifying treatment response. These findings indicate that biologically and clinically relevant information is preserved within the LIVI latent representations. ...

Using Associations Between Latent Factors and SNPs to Discover new eQTLs

Single-cell expression quantitative trait loci (eQTL) studies link genetic variants to
changes in gene expression in that cell. This allows us to study the effect of genetics on diseases per cell instead of aggregated, since effects can differ per cell type. Traditional SNP to gene expression linking on the single-cell level suffers from the multiple testing burden, due to the great amount of SNPs and genes. To address this, a deep learning framework was developed recently to compress gene expression into low-dimensional encodings and reconstruct the gene expression linearly from these encodings, enabling direct interpretation of the latent space. This model is called Latent Interaction Variational Inference (LIVI). Here, we determine whether the latent factors of this model can serve as a quantitative trait for Single Nucleotide Polymorphisms (SNPs) that associate with Rheumatoid Arthritis (RA) on a dataset with RA patients. RA is a chronic disease characterized by progressive damage of the joints. In this study, we found 617 out of 700 latent factors correlating to at least one SNP, using a linear mixed model. We also found that genes that are associated with RA in a Genome Wide Association Study have a higher loading for associated SNP-Latent factor pairs then for none associated one. We also identified genes affected by GWAS-identified risk SNPs for which the original GWAS did not identify a functionally associated gene. We conclude that the latent factors of the LIVI model can be used as a quantitative trait for SNPs, and used these latent factors to discover trans-eQTLs. ...

Evaluating the LIVI Latent Space using Gene Expression Data

Rheumatoid arthritis (RA) is a heterogeneous autoimmune disease: patients who share the same diagnosis respond differently to the same therapy. Zhang et al. stratified the RA synovium into six cell-type abundance phenotypes (CTAPs) by clustering counted, pre-annotated cell-type abundances. The LIVI model, built on a variational autoencoder developed to map trans-eQTLs in non-diagnosed donors, instead learns donor structure directly from gene expression, separating donor-level variation from cell-state variation. The developers of the model left two questions open: whether LIVI can also capture disease status in a diagnosed cohort, and what the optimal number of donor-level embeddings is for a given dataset. We address these by applying LIVI to a CITE-seq dataset of 314,011 cells from 70 RA and 9 OA donors across four different numbers of donor embeddings. In this research, we show that although LIVI is given no cell-type or diagnostic labels, its donor space recovers the underlying cell-type relationships between the six cell types defining CTAPs: a lymphoid (T, B, NK) versus non-lymphoid (myeloid, endothelial, fibroblast) block, which is consistent across all four dimensions. The CTAPs themselves do not form discretely separable groups in the donor space, but at lower dimensionalities, individual donor factors begin to distinguish them along the same axis. Reading the genes behind these factors was limited. We hypothesize this is due to LIVI's sparsity penalty, which was tuned for detecting trans-eQTLs on a much larger cohort, leaving ribosomal pathways dominating the loadings. Therefore, LIVI's donor space captures disease state information, but on a broader scale compared to Zhang et al.'s discretely defined CTAPs. For this particular dataset, the signal becomes stronger at lower dimensionalities, but the interpretability of the signal is limited. ...

An analysis on a clinical patient cohort

Rheumatoid arthritis (RA) is a highly heritable disease, yet how its genetic risk translates into cell-type-specific mechanisms remains poorly understood. LIVI is a model that decomposes single-cell expression into donor and cell-state latent spaces, allowing for the reconstruction of the original data, but additionally leaving room for analysis of the retained latent information. The model has been shown to recover polygenic risk signals in healthy cohorts, but whether that is transferable to cohorts with active disease has not been tested. In this work, we apply LIVI to a predominantly RA cohort, with osteoarthritis (OA) patients as control, and ask whether the latent factors carry information about polygenic risk. After first confirming that the clinical cohort cell-state space recovers known immune cell populations, we test each of the 700 donor factors against the polygenic risk scores (PRSs) for 21 diseases, under different testing conditions, and find that one significant factor (D462) is recovered between the latent space and RA PRS. This association survives ancestry correction, and changes in cohort. The factor localises to NK and T cells and drives antigen presentation program whose expression seems to be inversely related to RA risk. ...

Extending Latent Interaction Variational Inference (LIVI) Model with Protein Modality

Single-cell RNA sequencing enables the study of biological processes at high resolution, but the high dimensionality and sparsity of its measurements make downstream analyses, such as expression quantitative trait locus (eQTL) mapping, a difficult task. The Latent Interaction Variational Inference (LIVI) model addresses this challenge by learning low-dimensional interpretable embeddings for the cell-state, donor, and donor-cell-state interaction that can be used as phenotypes for association testing. However, LIVI models only gene-expression measurements and does not exploit information from other modalities, such as surface-protein counts that are included in widely used data collection methods such as CITE-seq. In this work, we investigate how LIVI can be extended to jointly model paired RNA and protein data and whether such an extension improves the biological interpretability of its latent representations. We introduce two architectures. Multimodal Shared-space Latent Interaction Variational Inference (MultiSLIVI) is a conservative extension in which RNA and protein measurements share the original cell-state latent space while being reconstructed through modality-specific decoders. Disentangled Multimodal Latent Interaction Variational Inference (DMLIVI) instead separates the cell-state representation into shared and modality-specific components, incorporating disentanglement principles from multimodal variational autoencoders. The models are evaluated using reconstruction performance, cell-type and donor predictability, latent-space structure, and downstream analysis. Most notably, both MultiSLIVI and DMLIVI recover fewer SNP-factor associations than the original LIVI model, indicating that the current multimodal extensions do not improve the donor-factor phenotypes used for eQTL mapping. Nevertheless, the proposed models provide a first step toward multimodal extensions of LIVI and highlight the importance of separating shared and modality-specific variation in future model designs. ...

Enhancing Accuracy and Biological Interpretability

Biological aging clocks estimate age from molecular data and provide insights into age-related functional decline. While aging clocks based on bulk transcriptomic data are well-studied, their single-cell counterparts remain limited and underexplored. In this study, we replicate and enhance a recent single-cell RNA-seq aging clock for human immune cells using ElasticNet, improving its performance through refined preprocessing, feature selection, and regularization. We also explore LightGBM to assess nonlinear modeling potential. Our enhanced models reduce prediction error, generalize better across external datasets, and identify biologically relevant genes through SHAP analysis. These findings support the development of accurate, interpretable, cell-type-specific aging clocks using single-cell data. ...
The aim of this research is to investigate whether physical gene characteristics can predict age-related changes in gene expression. Specifically, we analyze gene length, GC content, distance to the ends of the chromosome, and similar features to determine their connection with differential expression between young and old individuals. Among these features, gene length consistently shows a strong correlation with age-related expression patterns. However, when combined, the selected features do not provide sufficient predictive power to train a classifier capable of exceeding a modest 66% accuracy. These findings highlight the limitations of the current feature set and point toward the need for more complex feature preprocessing steps or biologically relevant features in future predictive models. ...
Aging is the biological process that changes the body over time. When we age our bodies become more prone to disease and other health risks. But not everyone experiences these changes at the same age. This is because the age of our cells (biological age) does not always match our chronological age (time since birth). Being able to predict someone’s biological age and comparing it to their chronological age can be used to infer if someone is indeed more prone to diseases or other health risks.

Other studies have been able to predict the age of cells by using gene expressions. They explore the number of expressions in young and old individuals to identify genes that are affected by age. What has not yet been explored is how the correlation of gene pairs are affected by age. How genes cooperate can change with age, this can be captured by looking at how genes correlate and how that correlation changes with age. This paper will explore these correlations and answer the following question. By performing a correlation analysis between features of young individuals, and on the same features for old individuals, can we interpret any differences and use those to improve current age prediction models?

During this study we found a lot of gene pairs that have a significant difference in correlation from younger to older individuals. We also identified hub genes that change correlation with many other genes. Using these genes to train a linear regression model we were able to predict the age of cells with a Mean Absolute Error of 9.7835.

Using the hub genes we were not able to improve the current existing linear regression model. But we did identify genes that have earlier been linked to aging. Like LIMD2, but also a lot of ribosomal genes and mitochondrial genes, both of which lose functionality with aging. ...