YT

Y.I. Tepeli

info

Please Note

6 records found

Doctoral thesis (2025) - Y.I. Tepeli, M.J.T. Reinders, Joana Gonçalves
The shift to precision medicine in cancer focuses on providing therapies targeting vulnerabilities of each individual patient tumor. This approach involves identifying cancer subtypes and discovering targets, such as genetic interactions, to treat patients who lack effective therapy. While computational tools, especially machine learning methods, are essential to analyze complex high-dimensional molecular data and suggest new candidate treatment strategies, their effectiveness is often questioned due to data-related challenges. Specifically, limitations in data collection result in sparse or biased biological data, hindering accurate decision-making and the identification of correct patterns. This thesis proposes state of the art solutions to learn improved prediction models for precision medicine and beyond by leveraging relevant data that was previously ignored, and addressing issues of data sparsity and bias.

Prediction of gene synthetic lethalities to identify novel therapeutic targets has overlooked sequence similarity, which is both a notable indicator of functional relation and available for every gene pair, unlike sparser data sources often used for this prediction task. Existing models also struggle to generalize beyond known synthetic lethalities due to an over reliance on data affected by prominent biases. Similarly, the stratification of cancer cohorts without effective treatments is challenging due to the small sample sizes of cancer (sub)cohorts such as oncogene-driven cohorts. In addition, stratification might not directly uncover an actionable treatment opportunity. The integration of dense protein sequence similarity and comprehensive drug response data each, together with methodological advances, led to significant improvements and revealed promising therapeutic opportunities.

Although these integrations improved the performance of computational methods, selection bias, a nonrandom sampling of training data, remained a significant issue affecting fair evaluation and generalizability. Thus, this thesis also introduces strategies to evaluate and mitigate the impact on model generalizability and fairness when the selected training data is not representative of the underlying population. We first artificially induce multivariate selection bias by favoring the selection of specific clusters of samples to study the fair evaluation of model generalizability. Then, to mitigate selection bias, we advance semi-supervised learning methods that use unlabeled data to gain insight into the distribution of the population beyond the labeled training data and promote sample diversity to counter confirmation bias typical of existing approaches. Our approaches include bias mitigation designed for specific machine learning models, such as forest ensembles and neural networks, and model-agnostic methods that operate under fewer assumptions. We show that diversity-guided semi-supervised learning strategies outperform existing domain adaptation techniques in the presence of various selection biases.

The computational methods proposed in this thesis enhance therapeutic target discovery in cancer and address selection bias in machine learning to advance precision medicine in cancer and improve the generalizability and fairness of bioinformatics models. ...
Journal article (2023) - Y.I. Tepeli, C.F. Seale, Joana P. Gonçalves
Motivation

Anti-cancer therapies based on synthetic lethality (SL) exploit tumour vulnerabilities for treatment with reduced side effects, by targeting a gene that is jointly essential with another whose function is lost. Computational prediction is key to expedite SL screening, yet existing methods are vulnerable to prevalent selection bias in SL data and reliant on cancer or tissue type-specific omics, which can be scarce. Notably, sequence similarity remains underexplored as a proxy for related gene function and joint essentiality.
Results

We propose ELISL, Early–Late Integrated SL prediction with forest ensembles, using context-free protein sequence embeddings and context-specific omics from cell lines and tissue. Across eight cancer types, ELISL showed superior robustness to selection bias and recovery of known SL genes, as well as promising cross-cancer predictions. Co-occurring mutations in a BRCA gene and ELISL-predicted pairs from the HH, FGF, WNT, or NEIL gene families were associated with longer patient survival times, revealing therapeutic potential. ...
Journal article (2022) - Colm Seale, Yasin Tepeli, Joana P. Gonçalves
Motivation
Synthetic lethality (SL) between two genes occurs when simultaneous loss of function leads to cell death. This holds great promise for developing anti-cancer therapeutics that target synthetic lethal pairs of endogenously disrupted genes. Identifying novel SL relationships through exhaustive experimental screens is challenging, due to the vast number of candidate pairs. Computational SL prediction is therefore sought to identify promising SL gene pairs for further experimentation. However, current SL prediction methods lack consideration for generalizability in the presence of selection bias in SL data.
Results
We show that SL data exhibit considerable gene selection bias. Our experiments designed to assess the robustness of SL prediction reveal that models driven by the topology of known SL interactions (e.g. graph, matrix factorization) are especially sensitive to selection bias. We introduce selection bias-resilient synthetic lethality (SBSL) prediction using regularized logistic regression or random forests. Each gene pair is described by 27 molecular features derived from cancer cell line, cancer patient tissue and healthy donor tissue samples. SBSL models are built and tested using approximately 8000 experimentally derived SL pairs across breast, colon, lung and ovarian cancers. Compared to other SL prediction methods, SBSL showed higher predictive performance, better generalizability and robustness to selection bias. Gene dependency, quantifying the essentiality of a gene for cell survival, contributed most to SBSL predictions. Random forests were superior to linear models in the absence of dependency features, highlighting the relevance of mutual exclusivity of somatic mutations, co-expression in healthy tissue and differential expression in tumour samples.
Availability and implementation
https://github.com/joanagoncalveslab/sbsl
Supplementary information
Supplementary data are available at Bioinformatics online. ...