Assessing the Impact of Ligated Chimeric Artefacts on Viral Diversity Estimation

Master Thesis (2025)
Author(s)

W.J. Sung (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.A. Baaijens – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Michiel Weber – Mentor (Cerba Research NL)

Thomas Abeel – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

M. Khosla – Graduation committee member (TU Delft - Multimedia Computing)

More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
11-07-2025
Awarding Institution
Programme
Computer Science
Downloads counter
143
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Reliable estimation of intra-host viral diversity is essential for understanding viral evolution, treatment resistance, and outbreak dynamics. However, technical artefacts introduced during sample preparation and sequencing can distort variant frequencies and lead to incorrect conclusions. One such group of artefacts is ligated chimeric reads, also referred to as ligation chimeras, formed when full-length DNA molecules are erroneously joined during library preparation. Ligation chimeras are currently poorly characterized and their impact on downstream analyses is largely unknown. In this thesis, we developed a modular and reproducible computational pipeline to detect, quantify, and analyze ligated chimeras in amplicon-based viral sequencing datasets. We applied this pipeline to both public and internal datasets, evaluating the prevalence and structural patterns of chimeras and their impact on viral diversity estimates. Our results show that ligated chimeras are widespread, disproportionately affect specific amplicons, and can introduce substantial allele frequency shifts and spurious variants. This means that common filtering strategies in current pipelines risk discarding true low-frequency variants or failing to remove artefactual ones. These findings highlight the importance of chimera-aware preprocessing to ensure accurate viral diversity estimation from long-read sequencing data.

Files

Sung_Thesis_2025.pdf
(pdf | 6.39 Mb)
License info not available