Fragmenting Genome Sequences by Coding Regions to Improve Performance of the AmpliDiff Algorithm for Large Genomes

Bachelor Thesis (2024)
Author(s)

S.M.F. Karskens (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.A. Baaijens – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

J. van Bemmelen – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

C.A. Raman – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
02-02-2024
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
173
Collections
thesis
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Abundance estimation with the use of environmental samples has been used during the SARS-CoV-2 pandemic to identify the abundances of different lineages. AmpliDiff, an algorithm that tries to find parts of DNA that can differentiate between different input genomes was used on a SARS-CoV-2 dataset to find these amplicons. The AmpliDiff algorithm was able to run on the SARS-CoV-2 set but seemed infeasible for datasets that contain larger or more complex genomes because of the computational requirements and runtime. We introduce a new pre-processing strategy based on selecting the most differentiable coding regions and show the modifications done to AmpliDiff to make AmpliDiff work following this new method. Based on the results we conclude that the approach is promising but still requires more research to be used optimally.

Files

CSE3000_Final_Paper.pdf
(pdf | 0.532 Mb)
License info not available