Fragmenting Genome Sequences by Coding Regions to Improve Performance of the AmpliDiff Algorithm for Large Genomes

Bachelor Thesis (2024)
Author(s)

S.M.F. Karskens (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Jasmijn A. Baaijens – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

J. van Bemmelen – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Chirag Raman – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2024 Samuel Karskens
More Info
expand_more
Publication Year
2024
Language
English
Copyright
© 2024 Samuel Karskens
Graduation Date
02-02-2024
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Abundance estimation with the use of environmental samples has been used during the SARS-CoV-2 pandemic to identify the abundances of different lineages. AmpliDiff, an algorithm that tries to find parts of DNA that can differentiate between different input genomes was used on a SARS-CoV-2 dataset to find these amplicons. The AmpliDiff algorithm was able to run on the SARS-CoV-2 set but seemed infeasible for datasets that contain larger or more complex genomes because of the computational requirements and runtime. We introduce a new pre-processing strategy based on selecting the most differentiable coding regions and show the modifications done to AmpliDiff to make AmpliDiff work following this new method. Based on the results we conclude that the approach is promising but still requires more research to be used optimally.

Files

CSE3000_Final_Paper.pdf
(pdf | 0.532 Mb)
License info not available