Benchmarking of viral quasispecies assembly algorithms

More Info
expand_more

Abstract

Viral quasispecies refers to viral populations that comprises of numerous viral strains closely related to each other due to within-host evolution or co-infection. The reconstruction of viral strain-specific genomes using sequencing reads is referred to as viral quasispecies assembly, and it is also crucial to determine the relative abundances of the viral strains in the mixture for various treatments. There are currently many software tools available to transform NGS sequencing reads into haplotypes but earlier benchmarks of viral quasispecies reconstruction tools were only tested using simulated datasets but do not reflect closely on the real-world scenarios and on virus evolution. In this research, using realistic evolutionary viral populations, we assessed six viral quasispecies assembly tools. The existing real dataset mix that is still being used for experiments is a decade old, so it has become important to create broader and complex high quality real datasets as a new standard for future haplotype caller experiments. We introduce a new high quality benchmarking dataset for viral quasispecies assembly from real samples. The aim of this research is to evaluate extensive performance of six tools approaches that allow for reconstruction of unique viral haplotypes which are necessary to research complex and heterogeneous virus communities thoroughly. A comparative study of the performance of these tools has been done. Based on the results achieved, to improve the haplotypes generated, an existing de novo method is used for reconstructing full-length haplotypes from pre-assembled contigs of challenging mixed samples. In general, this improved the overall accuracy of the assembly and abundance estimations.