TANDEM

A two-stage approach to maximize interpretability of drug response models based on multiple molecular data types

Journal Article (2016)
Authors

Nanne Aben (Nederlands Kanker Instituut - Antoni van Leeuwenhoek ziekenhuis, TU Delft - Pattern Recognition and Bioinformatics)

Daniel J. Vis (Nederlands Kanker Instituut - Antoni van Leeuwenhoek ziekenhuis)

Magali Michaut (Nederlands Kanker Instituut - Antoni van Leeuwenhoek ziekenhuis)

L.F.A. Wessels (TU Delft - Pattern Recognition and Bioinformatics, Nederlands Kanker Instituut - Antoni van Leeuwenhoek ziekenhuis)

Research Group
Pattern Recognition and Bioinformatics
Copyright
© 2016 N.N. Aben, Daniel J. Vis, Magali Michaut, L.F.A. Wessels
More Info
expand_more
Publication Year
2016
Language
English
Copyright
© 2016 N.N. Aben, Daniel J. Vis, Magali Michaut, L.F.A. Wessels
Research Group
Pattern Recognition and Bioinformatics
Issue number
17
Volume number
32
Pages (from-to)
i413-i420
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Motivation: Clinical response to anti-cancer drugs varies between patients. A large portion of this variation can be explained by differences in molecular features, such as mutation status, copy number alterations, methylation and gene expression profiles. We show that the classic approach for combining these molecular features (Elastic Net regression on all molecular features simultaneously) results in models that are almost exclusively based on gene expression. The gene expression features selected by the classic approach are difficult to interpret as they often represent poorly studied combinations of genes, activated by aberrations in upstream signaling pathways.
Results: To utilize all data types in a more balanced way, we developed TANDEM, a two-stage approach in which the first stage explains response using upstream features (mutations, copy number, methylation and cancer type) and the second stage explains the remainder using downstream features (gene expression). Applying TANDEM to 934 cell lines profiled across 265 drugs (GDSC1000), we show that the resulting models are more interpretable, while retaining the same predictive performance as the classic approach. Using the more balanced contributions per data type as determined with TANDEM, we find that response to MAPK pathway inhibitors is largely predicted by mutation data, while predicting response to DNA damaging agents requires gene expression data, in particular SLFN11 expression.

Files

11577677.pdf
(pdf | 0.577 Mb)
License info not available