Characterizing tandem repeat complexities across long-read sequencing platforms with TREAT and otter

Journal Article (2024)
Authors

Niccoló Tesi (Vrije Universiteit Amsterdam, TU Delft - Pattern Recognition and Bioinformatics)

Alex Salazar (Vrije Universiteit Amsterdam, VIB)

Yaran Zhang (Vrije Universiteit Amsterdam)

Sven van der Lee (Vrije Universiteit Amsterdam)

Marc Hulsman (TU Delft - Pattern Recognition and Bioinformatics, Vrije Universiteit Amsterdam)

Lydian Knoop (Vrije Universiteit Amsterdam)

Sanduni Wijesekera (Vrije Universiteit Amsterdam)

Marcel JT Reinders (TU Delft - Pattern Recognition and Bioinformatics)

Henne Holstege (Vrije Universiteit Amsterdam, TU Delft - Intelligent Systems)

G.B. More Authors

Research Group
Pattern Recognition and Bioinformatics
To reference this document use:
https://doi.org/10.1101/gr.279351.124
More Info
expand_more
Publication Year
2024
Language
English
Research Group
Pattern Recognition and Bioinformatics
Issue number
11
Volume number
34
Pages (from-to)
1942-1953
DOI:
https://doi.org/10.1101/gr.279351.124
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Tandem repeats (TRs) play important roles in genomic variation and disease risk in humans. Long-read sequencing allows for the accurate characterization of TRs; however, the underlying bioinformatics perspectives remain challenging. We present otter and TREAT: otter is a fast targeted local assembler, cross-compatible across different sequencing platforms. It is integrated in TREAT, an end-to-end workflow for TR characterization, visualization, and analysis across multiple genomes. In a comparison with existing tools based on long-read sequencing data from both Oxford Nanopore Technology (ONT, Simplex and Duplex) and Pacific Bioscience (PacBio, Sequel II and Revio), otter and TREAT achieve state-of-the-art genotyping and motif characterization accuracy. Applied to clinically relevant TRs, TREAT/otter significantly identify individuals with pathogenic TR expansions. When applied to a case-control setting, we replicate previously reported associations of TRs with Alzheimer's disease, including those near or within APOC1 (P = 2.63 × 10−9), SPI1 (P = 6.5 × 10−3), and ABCA7 (P = 0.04) genes. Finally, we use TREAT/otter to systematically evaluate potential biases when genotyping TRs using diverse ONT and PacBio long-read sequencing data sets. We show that, in rare cases (0.06%), long-read sequencing from coverage drops in TRs, including the disease-associated TRs in ABCA7 and RFC1 genes. Such coverage drops can lead to TR misgenotyping, hampering the accurate characterization of TR alleles. Taken together, our tools can accurately genotype TRs across different sequencing technologies and with minimal requirements, allowing end-to-end analysis and comparisons of TRs in human genomes, with broad applications in research and clinical fields.