Kallisto Repurposed

Using sequencing reads from the spike, nucleocapsid, and a middle region of nsp3 in the kallisto pipeline to better predict SARS-CoV-2 variants in wastewater

Bachelor Thesis (2022)
Author(s)

M. Anton (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Jasmijn A. Baaijens – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

K. Hildebrandt – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2022 Matei Anton
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Matei Anton
Graduation Date
28-01-2022
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Related content

DOI to code used in scientific work

http://doi.org/10.4121/18532973
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

During a viral infection, we expel remnants of the virus. This makes it possible to conduct wastewater analysis which aid in the efforts to track the evolution of the current Covid-19 pandemic. It has been shown that by repurposing the kallisto algorithm, the abundance of SARS-CoV-2 variants in wastewater samples can be estimated. Since this is a novel method for this scope, its precision could probably be improved by adjusting certain aspects. In this work, I look at one of those aspects: sequencing particular genomic regions of the virus rather than the entire genome. I have indeed found that the regions that code for the spike (S) and nucleocapsid (N) regions and a section around the region coding for the non-structural protein 3 (nsp3) give particularly accurate results when sequenced on their own. In addition, in at least one case, combining two well-performing regions further improves accuracy at lower simulated abundances of variants. This suggests that sequencing depth is preferred over sequencing breath as long as the region being sequenced contains enough information to distinguish between variants. These findings are important as they can aid in the improvement of this method of variant quantification. Moreover, they can also help in improving other algorithms applied to the SARS-CoV-2 genome by highlighting the genomic sections containing the most differentiating information between variants.

Files

License info not available