The impact of sequencing errors and contaminating viruses on SARS-CoV-2 variant detection by sequencing wastewater-sourced viral RNA

Bachelor Thesis (2022)
Author(s)

M.J. van der Lugt (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.A. Baaijens – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

K.A. Hildebrandt – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2022 Mart van der Lugt
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Mart van der Lugt
Graduation Date
28-01-2022
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Since the start of the SARS-CoV-2 pandemic, the monitoring of SARS-CoV-2 by way of viral RNA sequencing of wastewater has proven to be an efficient and effective way of estimating COVID-19 cases in population groups. A recently developed pipeline also enables us to estimate SARS-CoV-2 variant abundance using viral samples from wastewater. This is done by repurposing an RNA-seq quantification algorithm to quantify reads, belonging to variants, from DNA-sequencing data. However, the impact of sequencing errors and contaminating viruses on this process is unknown. Here I show that, in simulated data, the credibility of the prediction results is dependent on the error rate of the sequencing machines used. I also show that contaminating the simulated dataset with certain human coronaviruses has a significant effect on prediction accuracy. However, most viruses currently found in wastewater have no effect. Furthermore, adding a reference genome for these human corona-viruses to the reference set removes any impact. The results demonstrate that it is important to assess the credibility of the pipeline on a case by case basis and to tailor the testing setup and reference set to this assessment.

Files

Paper_mjvanderlugt.pdf
(pdf | 1.87 Mb)
License info not available