S.M.C. Mourragui | TU Delft Repository

Percolate

An Exponential Family JIVE Model to Design DNA-Based Predictors of Drug Response

Conference paper (2023) - Soufiane M.C. Mourragui, Marco Loog, Mirrelijn van Nee, Mark A.van de Wiel, Marcel J.T. Reinders, Lodewyk F.A. Wessels

Motivation: Anti-cancer drugs may elicit resistance or sensitivity through mechanisms which involve several genomic layers. Nevertheless, we have demonstrated that gene expression contains most of the predictive capacity compared to the remaining omic data types. Unfortunately, this comes at a price: gene expression biomarkers are often hard to interpret and show poor robustness. Results: To capture the best of both worlds, i.e. the accuracy of gene expression and the robustness of other genomic levels, such as mutations, copy-number or methylation, we developed Percolate, a computational approach which extracts the joint signal between gene expression and the other omic data types. We developed an out-of-sample extension of Percolate which allows predictions on unseen samples without the necessity to recompute the joint signal on all data. We employed Percolate to extract the joint signal between gene expression and either mutations, copy-number or methylation, and used the out-of sample extension to perform response prediction on unseen samples. We showed that the joint signal recapitulates, and sometimes exceeds, the predictive performance achieved with each data type individually. Importantly, molecular signatures created by Percolate do not require gene expression to be evaluated, rendering them suitable to clinical applications where only one data type is available. Availability: Percolate is available as a Python 3.7 package and the scripts to reproduce the results are available here. ...

Computational models for clinical drug response prediction

Aligning transcriptomic data of patients and pre-clinical models

Doctoral thesis (2023) - S.M.C. Mourragui, L.F.A. Wessels, M.J.T. Reinders, M. Loog

Extensive efforts in cancer research over the past decades have markedly improved diagnosis and treatments, leading to better outcomes for cancer patients. Paradoxically, however, these discoveries have begun to shed light on a level of complexity that rules out the emergence of a universal cancer treatment. As any tumor is now known to be essentially a unique disease, clinicians and researchers are moving towards a new paradigm, termed “precision medicine”, which consists of designing bespoke lines of treatment for each patient.

This paradigm-shift has been fueled by international consortia that have characterized large collections of tumors, thereby providing a vast reference for cancer heterogeneity. Two main strategies have been employed: sequencing of tumor biopsies directly extracted from patients or studying pre-clinical models, i.e., tumor cells cultured in artificial environments. While the first strategy generates clinically faithful data, the second strategy is flexible and cost-effective, and allows for the study of effects of various drugs at different concentrations.

Based on the large amount of data generated from pre-clinical models, computer
scientists have developed various machine learning algorithms to model drug response based on these data. However, these models do not take into account the complexity of human tumors and the differences between model systems and human tumors, and are therefore not directly applicable in a clinical setting. In this thesis, we aim at bridging this gap. Specifically, we develop algorithms to integrate and align data generated from the two aforementioned strategies with a goal to predict drug response in patients from datasets generated using pre-clinical models. ...

Predicting patient response with models trained on cell lines and patient-derived xenografts by nonlinear transfer learning

Journal article (2021) - Soufiane M.C. Mourragui, Marco Loog, Daniel J. Vis, Kat Moore, Anna G. Manjon, Mark A. van de Wiel, Marcel J.T. Reinders, Lodewyk F.A. Wessels

Preclinical models have been the workhorse of cancer research, producing massive amounts of drug response data. Unfortunately, translating response biomarkers derived from these datasets to human tumors has proven to be particularly challenging. To address this challenge, we developed TRANSACT, a computational framework that builds a consensus space to capture biological processes common to preclinical models and human tumors and exploits this space to construct drug response predictors that robustly transfer from preclinical models to human tumors. TRANSACT performs favorably compared to four competing approaches, including two deep learning approaches, on a set of 23 drug prediction challenges on The Cancer Genome Atlas and 226 metastatic tumors from the Hartwig Medical Foundation. We demonstrate that response predictions deliver a robust performance for a number of therapies of high clinical importance: platinum-based chemotherapies, gemcitabine, and paclitaxel. In contrast to other approaches, we demonstrate the interpretability of the TRANSACT predictors by correctly identifying known biomarkers of targeted therapies, and we propose potential mechanisms that mediate the resistance to two chemotherapeutic agents. ...

Few-shot learning creates predictive models of drug response that translate from high-throughput screens to individual patients

Journal article (2021) - Jianzhu Ma , Samson H. Fong, Trey Ideker, Yunan Luo, Christopher J. Bakkenist, John Paul Shen, Soufiane Mourragui, Lodewyk F.A. Wessels, Marc Hafner, Roded Sharan, Peng Jiang

Cell-line screens create expansive datasets for learning predictive markers of drug response, but these models do not readily translate to the clinic with its diverse contexts and limited data. In the present study, we apply a recently developed technique, few-shot machine learning, to train a versatile neural network model in cell lines that can be tuned to new contexts using few additional samples. The model quickly adapts when switching among different tissue types and in moving from cell-line models to clinical contexts, including patient-derived tumor cells and patient-derived xenografts. It can also be interpreted to identify the molecular features most important to a drug response, highlighting critical roles for RB1 and SMAD4 in the response to CDK inhibition and RNF8 and CHD4 in the response to ATM inhibition. The few-shot learning framework provides a bridge from the many samples surveyed in high-throughput screens (n-of-many) to the distinctive contexts of individual patients (n-of-one). ...

SpaGE

Spatial Gene Enhancement using scRNA-seq

Journal article (2020) - Tamim Abdelaal, Soufiane Mourragui, Ahmed Mahfouz, Marcel J.T. Reinders

Single-cell technologies are emerging fast due to their ability to unravel the heterogeneity of biological systems. While scRNA-seq is a powerful tool that measures whole-transcriptome expression of single cells, it lacks their spatial localization. Novel spatial transcriptomics methods do retain cells spatial information but some methods can only measure tens to hundreds of transcripts. To resolve this discrepancy, we developed SpaGE, a method that integrates spatial and scRNA-seq datasets to predict whole-transcriptome expressions in their spatial configuration. Using five dataset-pairs, SpaGE outperformed previously published methods and showed scalability to large datasets. Moreover, SpaGE predicted new spatial gene patterns that are confirmed independently using in situ hybridization data from the Allen Mouse Brain Atlas. ...

PRECISE

A domain adaptation approach to transfer predictors of drug response from pre-clinical models to tumors

Journal article (2019) - Soufiane Mourragui, Marco Loog, Mark A. van der Wiel, Marcel Reinders, Lodewyk Wessels

Motivation: Cell lines and patient-derived xenografts (PDXs) have been used extensively to understand the molecular underpinnings of cancer. While core biological processes are typically conserved, these models also show important differences compared to human tumors, hampering the translation of findings from pre-clinical models to the human setting. In particular, employing drug response predictors generated on data derived from pre-clinical models to predict patient response remains a challenging task. As very large drug response datasets have been collected for pre-clinical models, and patient drug response data are often lacking, there is an urgent need for methods that efficiently transfer drug response predictors from pre-clinical models to the human setting. Results: We show that cell lines and PDXs share common characteristics and processes with human tumors. We quantify this similarity and show that a regression model cannot simply be trained on cell lines or PDXs and then applied on tumors. We developed PRECISE, a novel methodology based on domain adaptation that captures the common information shared amongst pre-clinical models and human tumors in a consensus representation. Employing this representation, we train predictors of drug response on pre-clinical data and apply these predictors to stratify human tumors. We show that the resulting domain-invariant predictors show a small reduction in predictive performance in the pre-clinical domain but, importantly, reliably recover known associations between independent biomarkers and their companion drugs on human tumors. ...