Kallisto Repurposed

None, None

Kallisto Repurposed

Using sequencing reads from the spike, nucleocapsid, and a middle region of nsp3 in the kallisto pipeline to better predict SARS-CoV-2 variants in wastewater

Bachelor Thesis (2022)

Author(s)

M. Anton (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Jasmijn A. Baaijens – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

K. Hildebrandt – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Kallisto Covid-19 Sars-cov-2

To reference this document use:

https://resolver.tudelft.nl/uuid:990e42c3-e79f-4ff9-85ff-a614554269bb

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Graduation Date

28-01-2022

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Abstract

During a viral infection, we expel remnants of the virus. This makes it possible to conduct wastewater analysis which aid in the efforts to track the evolution of the current Covid-19 pandemic. It has been shown that by repurposing the kallisto algorithm, the abundance of SARS-CoV-2 variants in wastewater samples can be estimated. Since this is a novel method for this scope, its precision could probably be improved by adjusting certain aspects. In this work, I look at one of those aspects: sequencing particular genomic regions of the virus rather than the entire genome. I have indeed found that the regions that code for the spike (S) and nucleocapsid (N) regions and a section around the region coding for the non-structural protein 3 (nsp3) give particularly accurate results when sequenced on their own. In addition, in at least one case, combining two well-performing regions further improves accuracy at lower simulated abundances of variants. This suggests that sequencing depth is preferred over sequencing breath as long as the region being sequenced contains enough information to distinguish between variants. These findings are important as they can aid in the improvement of this method of variant quantification. Moreover, they can also help in improving other algorithms applied to the SARS-CoV-2 genome by highlighting the genomic sections containing the most differentiating information between variants.

Files

Dissertation_title_page_combin... (pdf)

(pdf | 2.88 Mb)

License info not available