A Comparison of Seed-and-Extend Techniques in Modern DNA Read Alignment Algorithms

Conference Paper (2016)
Authors

Nauman Ahmed (TU Delft - Computer Engineering)

Koen Bertels (FTQC/Bertels Lab, TU Delft - Quantum & Computer Engineering)

Zaid Al-Ars (TU Delft - Computer Engineering)

Research Group
Computer Engineering
Copyright
© 2016 N. Ahmed, K.L.M. Bertels, Z. Al-Ars
To reference this document use:
https://doi.org/10.1109/BIBM.2016.7822731
More Info
expand_more
Publication Year
2016
Language
English
Copyright
© 2016 N. Ahmed, K.L.M. Bertels, Z. Al-Ars
Research Group
Computer Engineering
Pages (from-to)
1421-1428
ISBN (print)
978-1-5090-1612-9
ISBN (electronic)
978-1-5090-1611-2
DOI:
https://doi.org/10.1109/BIBM.2016.7822731
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

DNA read alignment is a major step in genome analysis. However, as DNA reads continue to become longer, new approaches need to be developed to effectively use these longer reads in the alignment process. Modern aligners commonly use a two-step approach for read alignment: 1. seeding, 2. extension. In this paper, we have investigated various seeding and extension techniques used in modern DNA read alignment algorithms to find the best seeding and extension combinations. We developed an open source generic DNA read aligner that can be used to compare the alignment accuracy and total execution time of different combinations of seeding and extension algorithms. For extension, our results show that local alignment is the best extension approach, achieving up to 3.6x more accuracy than other extension techniques, for longer reads. For seeding, if BLAST-like seed extension is used, the best seeding approach is identifying all SMEMs in the DNA read (e.g., approach used by BWA-MEM). This combination is up to 6x more accurate than other seeding techniques, for longer reads. With local alignment, we observed that the seeding technique does not impact the alignment accuracy. Furthermore, we showed that an optimized implementation of local alignment using vector instructions, enabling 4.5x speedup, makes it the fastest of all extension techniques. Overall, we show that using local alignment with non-overlapping maximal exact matching seeds is the best seeding-extension combination due to its high accuracy and higher potential for optimization/acceleration for future DNA reads.

Files

Comparison_seed_extend.pdf
(pdf | 0.46 Mb)
License info not available