Explainable Survival Analysis

None, None

Explainable Survival Analysis

for Urothelial Cancer

Master Thesis (2021)

Author(s)

S. Kaur (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Joana Gonçalves – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

A. Csala – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

M.J.T. Reinders – Coach (TU Delft - Pattern Recognition and Bioinformatics)

T. Höllt – Coach (TU Delft - Computer Graphics and Visualisation)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

PCA Explainability PFI Survival analysis Urothelial Cancer Generalizability Model-agnostic Coxnet Rank SVM RSF GBoost C-index

To reference this document use:

https://resolver.tudelft.nl/uuid:17352c9c-0835-4d4c-b34d-d556a92d3c3c

More Info

expand_more

Publication Year

2021

Language

English

Copyright

Graduation Date

26-08-2021

Awarding Institution

Delft University of Technology

Programme

['Computer Science | Bioinformatics']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Survival analysis is a statistical method used to predict when an event will occur. Machine learning survival models have been used in many cancer studies. However, machine learning models may not always be interpretable. The current lack of research for explainable survival analysis for urothelial cancer prompted this study. This study offers an insight into the generalizability and explainability of machine learning models for urothelial cancer. We also determine how we can make the models interpretable in the presence of collinearity. In this study, we compared the performance of the models; Rank Linear Support Vector Machine (SVM), Rank Kernel SVM, Coxnet, Random Survival Forest (RSF), and Gradient Boosting (Gboost). We used the Memorial Sloan Kettering (MSK) and The Cancer Genome Atlas (TCGA) datasets. We used gene expression variables and clinical variables to train our models. We evaluated these models based on the C-index. We used Permutation Feature Importance (PFI), a model-agnostic method, to explain our models and used Principal Component Analysis (PCA) to deal with collinearity. We determined that the best linear model was Rank Linear SVM (C-index = 0.58) and the best non-linear model was RSF (C-index = 0.63). Using PFI showed that some of the top-most important genes were expressed in urothelial cancer, one of them even being a prognostic marker. With PCA, we were able to deal with collinearity, and the performance using PCA was comparable to models not using it. PFI with PCA showed that processes exhibited in the top genes were prevalent in cancer.

Files

MscThesis_SukhleenKaur.pdf

(pdf | 2.21 Mb)

License info not available