N.N. Aben | TU Delft Repository

Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen

Journal article (2019) - Michael P. Menden, Dennis Wang, Mike J. Mason, Bence Szalai, Krishna C. Bulusu, Yuanfang Guan, Thomas Yu, Nanne Aben, Lodewyk Wessels, More Authors...

The effectiveness of most cancer targeted therapies is short-lived. Tumors often develop resistance that might be overcome with drug combinations. However, the number of possible combinations is vast, necessitating data-driven approaches to find optimal patient-specific treatments. Here we report AstraZeneca’s large drug combination dataset, consisting of 11,576 experiments from 910 combinations across 85 molecularly characterized cancer cell lines, and results of a DREAM Challenge to evaluate computational strategies for predicting synergistic drug pairs and biomarkers. 160 teams participated to provide a comprehensive methodological development and benchmarking. Winning methods incorporate prior knowledge of drug-target interactions. Synergy is predicted with an accuracy matching biological replicates for >60% of combinations. However, 20% of drug combinations are poorly predicted by all methods. Genomic rationale for synergy predictions are identified, including ADAM17 inhibitor antagonism when combined with PIK3CB/D inhibition contrasting to synergy when combined with other PI3K-pathway inhibitors in PIK3CA mutant cells. ...

Predicting drug (combination) response through data integration

The whole is greater than the sum of its parts

Doctoral thesis (2019) - Nanne Aben

In order to improve anti-cancer treatment, we need to better understand why some patients respond to a given anti-cancer treatment, while others do not. To this end, several large-scale drug response screens have been performed in recent years, in which hundreds of tumor cell lines have been characterized formany molecular features (e.g. mutations, CNAs, methylation and gene expression), as well as for response to hundreds of anti-cancer drugs. By statistically associating these molecular features with the drug response, we can identify biomarkers of drug response: markers that (after thorough testing) can ultimately be used to help identify which treatment should be given to which patient. While performing such statistical analyses, we found that there are strong relationships between the different molecular datasets (e.g. mutations, CNAs, methylation and gene expression) and that these relationships can negatively affect our ability to identify biomarkers. Following these results, we have developed TANDEM, a method to identify biomarkers while taking into account these relationships between datasets, and iTOP, a method to infer how different datasets are related to each other. For difficult cases where the number of cell lines is very small, we have developed a method that predicts drug response simultaneously for all drugs in the screen, thereby gaining statistical power. We based this method on a machine learning methodology called multi-task learning. In contrast to other multi-task learning methods, our approach provides insight into which features are important for a given treatment, thereby allowing us to identify biomarkers fromthese models. Finally, we analyzed a screen of 54 drug combinations across 765 cell lines. We report which combinations show synergy (i.e. where the effect of the combination was larger than onewould expect based on the individual drug effects) most frequently, hence making them broadly applicable. In addition, for each drug combination, we statistically associated molecular features (i.e. mutations, copy number aberrations, gene expression and proteomics) with the synergy, from which the strongest associations may be good candidate biomarkers. ...

In order to improve anti-cancer treatment, we need to better understand why some patients respond to a given anti-cancer treatment, while others do not. To this end, several large-scale drug response screens have been performed in recent years, in which hundreds of tumor cell lines have been characterized formany molecular features (e.g. mutations, CNAs, methylation and gene expression), as well as for response to hundreds of anti-cancer drugs. By statistically associating these molecular features with the drug response, we can identify biomarkers of drug response: markers that (after thorough testing) can ultimately be used to help identify which treatment should be given to which patient. While performing such statistical analyses, we found that there are strong relationships between the different molecular datasets (e.g. mutations, CNAs, methylation and gene expression) and that these relationships can negatively affect our ability to identify biomarkers. Following these results, we have developed TANDEM, a method to identify biomarkers while taking into account these relationships between datasets, and iTOP, a method to infer how different datasets are related to each other. For difficult cases where the number of cell lines is very small, we have developed a method that predicts drug response simultaneously for all drugs in the screen, thereby gaining statistical power. We based this method on a machine learning methodology called multi-task learning. In contrast to other multi-task learning methods, our approach provides insight into which features are important for a given treatment, thereby allowing us to identify biomarkers fromthese models. Finally, we analyzed a screen of 54 drug combinations across 765 cell lines. We report which combinations show synergy (i.e. where the effect of the combination was larger than onewould expect based on the individual drug effects) most frequently, hence making them broadly applicable. In addition, for each drug combination, we statistically associated molecular features (i.e. mutations, copy number aberrations, gene expression and proteomics) with the synergy, from which the strongest associations may be good candidate biomarkers.

iTOP

Inferring the topology of omics data

Journal article (2018) - Nanne Aben, Johan A. Westerhuis, Yipeng Song, Henk A.L. Kiers, Magali Michaut, Age K. Smilde, Lodewyk F.A. Wessels

Motivation In biology, we are often faced with multiple datasets recorded on the same set of objects, such as multi-omics and phenotypic data of the same tumors. These datasets are typically not independent from each other. For example, methylation may influence gene expression, which may, in turn, influence drug response. Such relationships can strongly affect analyses performed on the data, as we have previously shown for the identification of biomarkers of drug response. Therefore, it is important to be able to chart the relationships between datasets. Results We present iTOP, a methodology to infer a topology of relationships between datasets. We base this methodology on the RV coefficient, a measure of matrix correlation, which can be used to determine how much information is shared between two datasets. We extended the RV coefficient for partial matrix correlations, which allows the use of graph reconstruction algorithms, such as the PC algorithm, to infer the topologies. In addition, since multi-omics data often contain binary data (e.g. mutations), we also extended the RV coefficient for binary data. Applying iTOP to pharmacogenomics data, we found that gene expression acts as a mediator between most other datasets and drug response: only proteomics clearly shares information with drug response that is not present in gene expression. Based on this result, we used TANDEM, a method for drug response prediction, to identify which variables predictive of drug response were distinct to either gene expression or proteomics. Availability and implementation An implementation of our methodology is available in the R package iTOP on CRAN. Additionally, an R Markdown document with code to reproduce all figures is provided as Supplementary Material. Supplementary information Supplementary data are available at Bioinformatics online. ...

Genomic Determinants of Protein Abundance Variation in Colorectal Cancer Cells

Journal article (2017) - Theodoros I. Roumeliotis, Steven P. Williams, Lodewyk Wessels, More authors..., Emanuel Gonçalves, Clara Alsinet, Martin Del Castillo Velasco-Herrera, Nanne Aben, Fatemeh Zamanzad Ghavidel, Magali Michaut, Michael Schubert, Stacey Price

Assessing the impact of genomic alterations on protein networks is fundamental in identifying the mechanisms that shape cancer heterogeneity. We have used isobaric labeling to characterize the proteomic landscapes of 50 colorectal cancer cell lines and to decipher the functional consequences of somatic genomic variants. The robust quantification of over 9,000 proteins and 11,000 phosphopeptides on average enabled the de novo construction of a functional protein correlation network, which ultimately exposed the collateral effects of mutations on protein complexes. CRISPR-cas9 deletion of key chromatin modifiers confirmed that the consequences of genomic alterations can propagate through protein interactions in a transcript-independent manner. Lastly, we leveraged the quantified proteome to perform unsupervised classification of the cell lines and to build predictive models of drug response in colorectal cancer. Overall, we provide a deep integrative view of the functional network and the molecular structure underlying the heterogeneity of colorectal cancer cells. ...

A Landscape of Pharmacogenomic Interactions in Cancer

Journal article (2016) - Francesco Iorio, Theo A. Knijnenburg, More Authors..., Daniel J. Vis, Graham R. Bignell, Michael P. Menden, Michael Schubert, Nanne Aben, Emanuel Gonçalves, Syd Barthorpe, Lodewyk Wessels

Systematic studies of cancer genomes have provided unprecedented insights into the molecular nature of cancer. Using this information to guide the development and application of therapies in the clinic is challenging. Here, we report how cancerdriven alterations identified in 11,289 tumors from 29 tissues (integrating somatic mutations, copy number alterations, DNA methylation, and gene expression) can be mapped onto 1,001 molecularly annotated human cancer cell lines and correlated with sensitivity to 265 drugs. We find that cell lines faithfully
recapitulate oncogenic alterations identified in tumors, find that many of these associate with drug sensitivity/resistance, and highlight the importance
of tissue lineage in mediating drug response. Logicbased modeling uncovers combinations of alterations that sensitize to drugs, while machine learning
demonstrates the relative importance of different data types in predicting drug response. Our analysis and datasets are rich resources to link genotypes with cellular phenotypes and to identify therapeutic options for selected cancer sub-populations. ...

TANDEM

A two-stage approach to maximize interpretability of drug response models based on multiple molecular data types

Journal article (2016) - Nanne Aben, Daniel J. Vis, Magali Michaut, Lodewyk Wessels

Motivation: Clinical response to anti-cancer drugs varies between patients. A large portion of this variation can be explained by differences in molecular features, such as mutation status, copy number alterations, methylation and gene expression profiles. We show that the classic approach for combining these molecular features (Elastic Net regression on all molecular features simultaneously) results in models that are almost exclusively based on gene expression. The gene expression features selected by the classic approach are difficult to interpret as they often represent poorly studied combinations of genes, activated by aberrations in upstream signaling pathways.
Results: To utilize all data types in a more balanced way, we developed TANDEM, a two-stage approach in which the first stage explains response using upstream features (mutations, copy number, methylation and cancer type) and the second stage explains the remainder using downstream features (gene expression). Applying TANDEM to 934 cell lines profiled across 265 drugs (GDSC1000), we show that the resulting models are more interpretable, while retaining the same predictive performance as the classic approach. Using the more balanced contributions per data type as determined with TANDEM, we find that response to MAPK pathway inhibitors is largely predicted by mutation data, while predicting response to DNA damaging agents requires gene expression data, in particular SLFN11 expression.
...