Learning Signature Exposures from Gene Expression at Single-Cell Resolution

None, None

Learning Signature Exposures from Gene Expression at Single-Cell Resolution

Regular vs. Multitask Learning of Individual Regression Models

Bachelor Thesis (2025)

Author(s)

A. Potolski Eilat (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Joana P. Gonçalves – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

S. Costa – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

I. Stresec – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Catherine Oertel – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty

Electrical Engineering, Mathematics and Computer Science

Cancer Single cell analysis Gene Expression Mutational signature

To reference this document use:

https://resolver.tudelft.nl/uuid:20421d34-f8db-48bf-ae7f-710a1d037ec6

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

25-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Understanding the mutational processes active within cancer cells is essential to improve diagnosis and treatment strategies. This study investigates whether the activity levels of these processes, quantified as mutational signature exposures, can be predicted from single-cell gene expression data. Two regression-based learning paradigms are compared: regular independent modelling, where the different models of each mutational signature selects its own regularisation parameter and set of genes, and multitask modelling, where the different models agree on a set of genes to be used for the prediction of each signature, and the regularisation parameter is shared. We evaluate their predictive performance and interpretability using biologically informed metrics. Furthermore, we assess the models’ robustness on unseen data by simulating real-world shifts through clustering-based data splits. Our results show that while both models achieve reasonable predictive accuracy, independently trained models offer greater flexibility and interpretability by identifying signature-specific genes and regularisation strengths. These findings suggest that gene expression carries meaningful information about a cell’s mutational history and that signature-specific modelling may offer better biological insight into tumour heterogeneity.

Files

Research_paper_4_.pdf

(pdf | 3.23 Mb)

License info not available