Effect of Output Granularity on SARS-CoV-2 Variant Abundance Estimates using Domestic Wastewater Sequencing

More Info


Monitoring of SARS-CoV-2 variants is crucial to efforts in combating the COVID-19 pandemic. Lineage level abundance estimates for SARS-CoV-2 can be obtained from viral material present in domestic wastewater. The abundance predictions can be made at different levels of granularity-individual lineage level(high granularity) or variant level(low granularity). The question this paper answers is to what extent abundance predictions are more accurate at lower granularity. Here we show that when wastewater samples contain only one lineage low granularity predictions are in general more accurate than high granularity for all lineages across Alpha, Delta and Mu variants. No variant level overestimation was observed for this experiment, which was thought to be something that could have made low granularity predictions less accurate than those at high granularity. When lineages of a variant were combined into a wastewater sample, the prediction error rose because of the smaller relative abundances of the genome sequences. Overestimation due to predictions of all lineages being pooled into one lineage was observed here with the overestimated high granularity lineage being more accurate than the low granularity predictions. If samples are expected to contain a very small amount of lineages then it is better to make predictions at low granularity. On the other hand, as the relative abundances of lineages decrease in a sample due to a large number of lineages, the chances of lineage level predictions having a smaller relative prediction error rate increases- making high granularity the better choice for more accurate predictions.