Learning Signature Exposures from Gene Expression at Single-Cell Resolution

Regular vs. Multitask Learning of Individual Regression Models

Bachelor Thesis (2025)
Author(s)

A. Potolski Eilat (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Joana P. Gonçalves – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

S. Costa – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

I. Stresec – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Catherine Oertel – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
25-06-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Understanding the mutational processes active within cancer cells is essential to improve diagnosis and treatment strategies. This study investigates whether the activity levels of these processes, quantified as mutational signature exposures, can be predicted from single-cell gene expression data. Two regression-based learning paradigms are compared: regular independent modelling, where the different models of each mutational signature selects its own regularisation parameter and set of genes, and multitask modelling, where the different models agree on a set of genes to be used for the prediction of each signature, and the regularisation parameter is shared. We evaluate their predictive performance and interpretability using biologically informed metrics. Furthermore, we assess the models’ robustness on unseen data by simulating real-world shifts through clustering-based data splits. Our results show that while both models achieve reasonable predictive accuracy, independently trained models offer greater flexibility and interpretability by identifying signature-specific genes and regularisation strengths. These findings suggest that gene expression carries meaningful information about a cell’s mutational history and that signature-specific modelling may offer better biological insight into tumour heterogeneity.

Files

Research_paper_4_.pdf
(pdf | 3.23 Mb)
License info not available