Fragmenting Genome Sequences by Coding Regions to Improve Performance of the AmpliDiff Algorithm for Large Genomes

None, None

Fragmenting Genome Sequences by Coding Regions to Improve Performance of the AmpliDiff Algorithm for Large Genomes

Bachelor Thesis (2024)

Author(s)

S.M.F. Karskens (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Jasmijn A. Baaijens – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

J. van Bemmelen – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Chirag Raman – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Scalability AmpliDiff Amplicon Abundance

To reference this document use:

https://resolver.tudelft.nl/uuid:f1041916-a81d-4b26-9639-623adb97efe6

More Info

expand_more

Publication Year

2024

Language

English

Copyright

Graduation Date

02-02-2024

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Abundance estimation with the use of environmental samples has been used during the SARS-CoV-2 pandemic to identify the abundances of different lineages. AmpliDiff, an algorithm that tries to find parts of DNA that can differentiate between different input genomes was used on a SARS-CoV-2 dataset to find these amplicons. The AmpliDiff algorithm was able to run on the SARS-CoV-2 set but seemed infeasible for datasets that contain larger or more complex genomes because of the computational requirements and runtime. We introduce a new pre-processing strategy based on selecting the most differentiable coding regions and show the modifications done to AmpliDiff to make AmpliDiff work following this new method. Based on the results we conclude that the approach is promising but still requires more research to be used optimally.

Files

CSE3000_Final_Paper.pdf

(pdf | 0.532 Mb)

License info not available