Print Email Facebook Twitter Representations of DNA Sequence Context and Mutational Spectra for Prediction of Repair Deficiencies Title Representations of DNA Sequence Context and Mutational Spectra for Prediction of Repair Deficiencies Author Borg, Jonathan (TU Delft Electrical Engineering, Mathematics and Computer Science; TU Delft Pattern Recognition and Bioinformatics) Contributor P. Gonçalves, Joana (mentor) Martinez, Jorge (graduation committee) Seale, C.F. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science | Bioinformatics Date 2023-11-02 Abstract Double-strand break (DSB) repair is a critical cellular process which repairs breaks in both strands of the DNA double helix. Different repair mechanisms are tasked with repairing such breaks. Predicting deficiencies in repair mechanisms has been widely used for therapeutic purposes, such as targeting cancer cells that have specific DNA repair deficiencies. DSB repair, however, is not error-free, resulting in mutations. These mutations are also influenced by the DNA sequence surrounding the break site. To the best of our knowledge, sequence representations have not been considered when predicting DNA repair deficiencies. We hypothesise that higher-order information can be extracted from sequence representations. In this study, we research the problem of predicting Non-Homologous End Joining (NHEJ) repair deficiencies. Initially, we evaluate how accurately we can predict NHEJ repair deficiency using only the mutational outcome frequencies (mutational spectra). Afterwards, we examine how combining mutational spectra with representations of the sequence surrounding the break site can improve the prediction of NHEJ repair deficiency. We demonstrate that adding DNABERT sequence representations to mutational spectra features significantly improves prediction accuracy from 94.44% to 96.12%. We also show that even simple sequence representations, such as 1-mer frequencies, can lead to significant improvements. Our findings highlight the importance of including sequence representations with mutational spectra in repair deficiency prediction. Subject DNA Double-Strand BreakRepair Pathway DeficiencyMutational SpectraDNA Sequence Representation To reference this document use: http://resolver.tudelft.nl/uuid:3ed86d63-8466-4ed7-a66f-1d7c7ee78003 Part of collection Student theses Document type master thesis Rights © 2023 Jonathan Borg Files PDF MScThesis_Jonathan_Borg.pdf 7.11 MB Close viewer /islandora/object/uuid:3ed86d63-8466-4ed7-a66f-1d7c7ee78003/datastream/OBJ/view