SARS-CoV-2 lineage abundance quantification in wastewater: a benchmark study for the identification of optimal reference set design

More Info
expand_more

Abstract

Lineage abundance estimation of SARS-CoV-2 in wastewater is a technique that aims to monitor the lineage prevalence in communities and help contain the COVID-19 pandemic. Lineages are collections of closely related mutants of a virus. It is suggested that the genome sequences of lineages differ across the globe due to random mutations or distinct immune responses of populations that mutate the virus. In order to estimate the lineage abundance in a specific community, wastewater data collected from the community are compared to reference SARS-CoV-2 genome sequences of different lineages. However, such region-related variation in the genome sequences of lineages could impact the abundance estimates. The main aim of this study is to identify an optimal way of sourcing reference genome sequences such that the lineage abundance estimates are improved. For the purpose of evaluating the performance of different reference sets, simulated wastewater data are used. We demonstrate that continent-specific reference sets are the most reliable option. The overall country interactions with other parts of the world could be considered for constructing an optimal reference set. Additionally, results show that considering immune-response related mutations for the reference set construction does not influence performance. Finally, it is suggested that a higher number of sequences per lineage and the inclusion of recently sourced sequences in the reference set improve results.