Assessment of detection limits in viral diversity studies using 454 amplicon sequencing

More Info
expand_more

Abstract

Background Next-generation sequencing enables to detect sequence diversity in populations of viruses, an essential step in the development of drug mixtures to combat viral infections. The process of sequencing patient samples using bidirectional Roche 454 amplicon sequencing technology, as implemented at the Delft Diagnostic Laboratory (DDL), introduces specific errors. There are various cleaning algorithms capable of removing these errors. However, low frequency mutations may also be removed, as these are assumed to be likely errors and errors are assumed to be independently distributed throughout the sequence. We tested the performance of three algorithms, AmpliconNoise, KEC and ShoRAH on the detection of low-frequency mutations. Results Through various experiments we show that some types of errors are not independent but are direction-dependent, caused by homopolymeric regions. Such errors can occur in up to 80% of the reads. As the methods tested could not remove these errors, we developed novel algorithms (MSAR, AFKnn and AFC) to correct these specific errors. Our algorithms combine the information present in forward and reverse reads to detect direction-dependent errors. MASR, AFKnn and AFC improved the detection limit of viral sequences when applied before KEC. In our experiments we found mutations with a frequency below 1.0% could still be detected. Conclusion Bidirectional sequencing is essential for 454 sequencing to detect and remove direction-dependent errors and thereby improve the detection of low-frequency mutations.