CHOP

Haplotype-aware path indexing in population graphs

Journal Article (2020)
Author(s)

Tom Mokveld (TU Delft - Computer Engineering)

J. Linthorst (TU Delft - Pattern Recognition and Bioinformatics, VU University Medical Centre)

Zaid Al-Ars (TU Delft - Computer Engineering)

H. Holstege (VU University Medical Centre, TU Delft - Intelligent Systems)

Marcel J.T. Reinders (TU Delft - Pattern Recognition and Bioinformatics)

Research Group
Pattern Recognition and Bioinformatics
Copyright
© 2020 T.O. Mokveld, J. Linthorst, Z. Al-Ars, H. Holstege, M.J.T. Reinders
DOI related publication
https://doi.org/10.1186/s13059-020-01963-y
More Info
expand_more
Publication Year
2020
Language
English
Copyright
© 2020 T.O. Mokveld, J. Linthorst, Z. Al-Ars, H. Holstege, M.J.T. Reinders
Research Group
Pattern Recognition and Bioinformatics
Issue number
1
Volume number
21
Pages (from-to)
1-16
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The practical use of graph-based reference genomes depends on the ability to align reads to them. Performing substring queries to paths through these graphs lies at the core of this task. The combination of increasing pattern length and encoded variations inevitably leads to a combinatorial explosion of the search space. Instead of heuristic filtering or pruning steps to reduce the complexity, we propose CHOP, a method that constrains the search space by exploiting haplotype information, bounding the search space to the number of haplotypes so that a combinatorial explosion is prevented. We show that CHOP can be applied to large and complex datasets, by applying it on a graph-based representation of the human genome encoding all 80 million variants reported by the 1000 Genomes Project.