Representations of DNA Sequence Context and Mutational Spectra for Prediction of Repair Deficiencies

Master Thesis (2023)
Author(s)

J. Borg (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Joana Gonçalves – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Jorge Martinez – Graduation committee member (TU Delft - Multimedia Computing)

C.F. Seale – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Jonathan Borg
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Jonathan Borg
Graduation Date
02-11-2023
Awarding Institution
Delft University of Technology
Programme
Computer Science | Bioinformatics
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Double-strand break (DSB) repair is a critical cellular process which repairs breaks in both strands of the DNA double helix. Different repair mechanisms are tasked with repairing such breaks. Predicting deficiencies in repair mechanisms has been widely used for therapeutic purposes, such as targeting cancer cells that have specific DNA repair deficiencies. DSB repair, however, is not error-free, resulting in mutations. These mutations are also influenced by the DNA sequence surrounding the break site. To the best of our knowledge, sequence representations have not been considered when predicting DNA repair deficiencies. We hypothesise that higher-order information can be extracted from sequence representations. In this study, we research the problem of predicting Non-Homologous End Joining (NHEJ) repair deficiencies. Initially, we evaluate how accurately we can predict NHEJ repair deficiency using only the mutational outcome frequencies (mutational spectra). Afterwards, we examine how combining mutational spectra with representations of the sequence surrounding the break site can improve the prediction of NHEJ repair deficiency. We demonstrate that adding DNABERT sequence representations to mutational spectra features significantly improves prediction accuracy from 94.44% to 96.12%. We also show that even simple sequence representations, such as 1-mer frequencies, can lead to significant improvements. Our findings highlight the importance of including sequence representations with mutational spectra in repair deficiency prediction.

Files

MScThesis_Jonathan_Borg.pdf
(pdf | 7.11 Mb)
License info not available