Mitigating selection bias in synthetic lethality prediction using metric learning

None, None

Mitigating selection bias in synthetic lethality prediction using metric learning

Master Thesis (2023)

Author(s)

M.J. de Wolf (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Joana P. Gonçalves – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

P.K. Murukannaiah – Graduation committee member (TU Delft - Interactive Intelligence)

Y.I. Tepeli – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

To reference this document use:

https://resolver.tudelft.nl/uuid:95cb2d5b-194a-49ec-8281-98cf0f4e35c0

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

23-06-2023

Awarding Institution

Delft University of Technology

Programme

['Computer Science | Bioinformatics']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Synthetic lethality (SL) is a relationship between two genes, exploited for targeted anti-cancer therapy, whereby functional loss of both genes induces cell death, but the functional loss of either gene alone is non-lethal. Computational prediction of SL gene pairs is sought after because it is expensive to do lab screening for SL. Existing SL labeled pairs from wet- lab experiments often focus on specific genes or pathways, resulting in notable selection bias. Current SL prediction methods ignore this bias when training on available SL labels, and fail to generalize if test sets follow a different selection bias. One way to mitigate bias is to incorporate unlabeled pairs during model learning. However, conventional semi-supervised methods such as self-training can reinforce bias by adding confidently pseudolabeled pairs, which tend to be most similar to previously included samples. We present DBST, a self-training strategy that addresses the issue by promoting diversity in the selection of pseudolabeled samples. This is achieved using metric learning to find a class-contrastive representation of the feature space, based on which DBST selects diverse (or dissimilar) pseudolabeled pairs. In results for five cancer types, semi-supervised models, including DBST, delivered improved SL prediction performance over the supervised model. Additionally, DBST successfully incorporated unlabeled samples that were more dissimilar among them compared to standard self-training. In experiments with differing biases between train and test sets, DBST showed a slight improvement in performance compared to the supervised model.

Files

Thesis_Mathijs_de_Wolf.pdf

(pdf | 7.88 Mb)

License info not available