Bounds on the Sequence Length Sufficient to Reconstruct Binary Level-1 Phylogenetic Networks Under the CFN Model

Journal Article (2026)
Author(s)

Martin Frohn (Maastricht University)

Niels Holtgrefe (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Leo van Iersel (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Mark Jones (Middlesex University)

Steven Kelk (Maastricht University)

Research Group
Discrete Mathematics and Optimization
DOI related publication
https://doi.org/10.1007/s00026-026-00830-0 Final published version
More Info
expand_more
Publication Year
2026
Language
English
Research Group
Discrete Mathematics and Optimization
Journal title
Annals of Combinatorics
Downloads counter
9
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Phylogenetic trees and networks are graphs used to model evolutionary relationships, with trees representing strictly branching histories and networks allowing for events in which lineages merge, called reticulation events. While the question of data sufficiency has been studied extensively in the context of trees, it remains largely unexplored for networks. In this work, we take a first step in this direction by establishing bounds on the amount of genomic data required to reconstruct binary level-1 semi-directed phylogenetic networks, which are binary networks in which reticulation events are indicated by directed edges, all other edges are undirected, and cycles are vertex disjoint. For this class, methods have been developed recently that are statistically consistent. Roughly speaking, such methods are guaranteed to reconstruct the correct network assuming infinitely long genomic sequences. Here we consider the question whether networks from this class can be uniquely and correctly reconstructed from finite sequences. Specifically, we present an inference algorithm that takes as input genetic sequence data, and demonstrate that the sequence length sufficient to reconstruct the correct network with high probability, under the CFN model of evolution, scales logarithmically, polynomially, or polylogarithmically with the number of taxa, depending on the parameter regime. As part of our contribution, we also present novel inference rules for quartet data in the semi-directed phylogenetic network setting.