Lv

L.J.J. van Iersel

info

Please Note

77 records found

Journal article (2026) - Christopher Reichling, Leo van Iersel, Yukihiro Murakami
Phylogenetic networks are useful in representing the evolutionary history of taxa. In certain scenarios, one requires a way to compare different networks. In practice, this can be rather difficult, except within specific classes of networks. In this paper, we derive metrics for the class of orchard networks and the class of strongly reticulation-visible networks, from variants of so-called μ-representations, which are vector representations of networks. For both network classes, we impose degree constraints on the vertices, by considering semi-binary networks. ...
Journal article (2026) - Niels Holtgrefe, Katharina T. Huber, Leo van Iersel, Mark Jones, Vincent Moulton
In evolutionary biology, phylogenetic networks are graphs that provide a flexible framework for representing complex evolutionary histories that involve reticulate evolutionary events. Recently, phylogenetic studies have started to focus on a special class of such networks called semi-directed networks. These graphs are defined as mixed graphs that can be obtained by de-orienting some of the arcs in some rooted phylogenetic network, that is, a directed acyclic graph whose leaves correspond to a collection of species and that has a single source or root vertex. However, this definition of semi-directed networks is implicit in nature since it is not clear when a mixed-graph enjoys this property or not. In this paper, we introduce novel, explicit mathematical characterizations of semi-directed networks, and also multi-semi-directed networks, that is mixed graphs that can be obtained from directed phylogenetic networks that may have more than one root. In addition, through extending foundational tools from the theory of rooted networks into the semi-directed setting—such as cherry picking sequences, omnians, and path partitions—we characterize when a (multi-)semi-directed network can be obtained by de-orienting some rooted network that is contained in one of the well-known classes of tree-child, orchard, tree-based or forest-based networks. These results address structural aspects of (multi-)semi-directed networks and pave the way to improved theoretical and computational analyses of such networks, for example, within the development of algebraic evolutionary models that are based on such networks. ...

Correction: "Distinguishing Phylogenetic Level-2 Networks with Quartets and Inter-Taxon Quartet Distances" (Bulletin of mathematical biology (2025) 87 12 DOI: 10.1007/s11538-025-01549-4.)

Journal article (2026) - Niels Holtgrefe, Elizabeth S. Allman, Hector Baños, Leo van Iersel, Vincent Moulton, John A. Rhodes, Kristina Wicke
Journal article (2026) - Niels Holtgrefe, Leo van Iersel, Mark Jones
To measure the tree-likeness of a directed acyclic graph (DAG), a new width parameter that considers the directions of the arcs was recently introduced: scanwidth. We present the first algorithm that efficiently computes the exact scanwidth of general DAGs. For DAGs with one root and scanwidth k it runs in O(k⋅nk⋅m) time. The algorithm also functions as an FPT algorithm with complexity O(24ℓ−1⋅ℓ⋅n+n2) for phylogenetic networks of level-ℓ, a type of DAG used to depict evolutionary relationships among species. Our algorithm performs well in practice, being able to compute the scanwidth of synthetic networks up to 30 reticulations and 100 leaves within 500 seconds. Furthermore, we propose a heuristic that obtains an average practical approximation ratio of 1.5 on these networks. While we prove that the scanwidth is bounded from below by the treewidth of the underlying undirected graph, experiments suggest that for networks the parameters are close in practice. ...
Journal article (2026) - Martin Frohn, Niels Holtgrefe, Leo van Iersel, Mark Jones, Steven Kelk
Phylogenetic trees and networks are graphs used to model evolutionary relationships, with trees representing strictly branching histories and networks allowing for events in which lineages merge, called reticulation events. While the question of data sufficiency has been studied extensively in the context of trees, it remains largely unexplored for networks. In this work, we take a first step in this direction by establishing bounds on the amount of genomic data required to reconstruct binary level-1 semi-directed phylogenetic networks, which are binary networks in which reticulation events are indicated by directed edges, all other edges are undirected, and cycles are vertex disjoint. For this class, methods have been developed recently that are statistically consistent. Roughly speaking, such methods are guaranteed to reconstruct the correct network assuming infinitely long genomic sequences. Here we consider the question whether networks from this class can be uniquely and correctly reconstructed from finite sequences. Specifically, we present an inference algorithm that takes as input genetic sequence data, and demonstrate that the sequence length sufficient to reconstruct the correct network with high probability, under the CFN model of evolution, scales logarithmically, polynomially, or polylogarithmically with the number of taxa, depending on the parameter regime. As part of our contribution, we also present novel inference rules for quartet data in the semi-directed phylogenetic network setting. ...
Preprint (2025) - Aviva K. Englander, Martin Frohn, Elizabeth Gross, Niels Holtgrefe, Leo van Iersel, Mark Jones, Seth Sullivant
Journal article (2025) - Jordan Dempsey, Leo van Iersel, Mark Jones, Norbert Zeh
Maximum agreement forests have been used as a measure of dissimilarity of two or more phylogenetic trees on a given set of taxa. An agreement forest is a set of trees that can be obtained from each of the input trees by deleting edges and suppressing degree-2 vertices. A maximum agreement forest is such a forest with the minimum number of components. We present a simple 4-approximation algorithm for computing a maximum agreement forest of multiple unrooted binary trees. This algorithm applies LP rounding to an extension of a recent ILP formulation of the maximum agreement forest problem on two trees by Van Wersch et al. [13]. We achieve the same approximation ratio as the algorithm by Chen et al. [3], but our algorithm is extremely simple. We also prove that no algorithm based on the ILP formulation by Van Wersch et al. can achieve an approximation ratio of 4−ε, for any ε>0, even on two trees. To this end, we prove that the integrality gap of the ILP approaches 4 as the size of the two input trees grows. ...
Journal article (2025) - J. C. Pons, P. Vives Lopez, Y. Murakami, L. v. Iersel
Reticulate evolution can be modelled using phylogenetic networks. Tree-based networks, which are one of the more general classes of phylogenetic networks, have recently gained eminence for its ability to represent evolutionary histories with an underlying tree structure. To better understand tree-based networks, numerous characterizations have been proposed, based on tree embeddings, matchings, and arc partitions. Here, we build a bridge between two arc partition characterizations, namely maximal fence decompositions and cherry covers. Results on cherry covers have been found for general phylogenetic networks. We first show that the number of cherry covers is the same as the number of support trees (underlying tree structure of tree-based networks) for a given semi-binary network. Maximal fence decompositions have only been defined thus far for binary networks (constraints on vertex degrees). We remedy this by generalizing fence decompositions to non-binary networks, and using this, we characterize semi-binary tree-based networks in terms of forbidden structures. Furthermore, we give an explicit enumeration of cherry covers of semi-binary networks, by studying its fence decomposition. Finally, we prove that it is possible to characterize semi-binary tree-child networks, a subclass of tree-based networks, in terms of the number of their cherry covers. ...
Journal article (2025) - Katharina T Huber, Leo van Iersel, Vincent Moulton, Guillaume E. Scholz
In evolutionary biology, networks are becoming increasingly used to represent evolutionary histories for species that have undergone non-treelike or reticulate evolution. Such networks are essentially directed acyclic graphs with a leaf set that corresponds to a collection of species, and in which non-leaf vertices with indegree 1 correspond to speciation events and vertices with indegree greater than 1 correspond to reticulate events such as gene transfer. Recently forest-based networks have been introduced, which are essentially (multi-rooted) networks that can be formed by adding some arcs to a collection of phylogenetic trees (or phylogenetic forest), where each arc is added in such a way that its ends always lie in two different trees in the forest. In this paper, we consider the complexity of deciding whether a given network is proper forest-based, that is, whether it can be formed by adding arcs to some underlying phylogenetic forest which contains the same number of trees as there are roots in the network. More specifically, we show that it is NP-complete to decide whether a tree-child network with m roots is proper forest-based, for each m≥2. Moreover, for binary networks the problem remains NP-complete when m≥3 but becomes polynomial-time solvable for m=2. We also give a fixed parameter tractable (FPT) algorithm, with parameters the maximum outdegree of a vertex, the number of roots, and the number of indegree 2 vertices, for deciding if a semi-binary network is proper forest-based. A key element in proving our results is a new characterization for when a network with m roots is proper forest-based in terms of certain m-colorings. ...
Journal article (2025) - Martin Frohn, Niels Holtgrefe, Leo van Iersel, Mark Jones, Steven Kelk
Semi-directed networks are partially directed graphs that model evolution where the directed edges represent reticulate evolutionary events. We present an algorithm that reconstructs binary n-leaf semi-directed level-1 networks in O(n 2) time from its quarnets (4-leaf subnetworks). Our method assumes we have direct access to all quarnets, yet uses only an asymptotically optimal number of O(nlog⁡n) quarnets. When the network is assumed to contain no triangles, our method instead relies only on four-cycle quarnets and the splits of the other quarnets. A variant of our algorithm works with quartets rather than quarnets and we show that it reconstructs most of a semi-directed level-1 network from an asymptotically optimal O(nlog⁡n) of the quartets it displays. Additionally, we provide an O(n 3) time algorithm that reconstructs the tree-of-blobs of any binary n-leaf semi-directed network with unbounded level from O(n 3) splits of its quarnets. ...
Conference paper (2025) - Leo van Iersel, Mark Jones, Jannik Schestag, Celine Scornavacca, Mathias Weller
Network Phylogenetic Diversity (Network-PD) is a measure for the diversity of a set of species based on a rooted phylogenetic network (with branch lengths and inheritance probabilities on the reticulation edges) describing the evolution of those species. We consider the Max-Network-PD problem: given such a network, find k species with maximum Network-PD score. We show that this problem is fixed-parameter tractable (FPT) for binary networks, by describing an optimal algorithm running in O(2rlog(k)(n+r)) time, with n the total number of species in the network and r its reticulation number. Furthermore, we show that Max-Network-PD is NP-hard for level-1 networks, proving that, unless P = NP, the FPT approach cannot be extended by using the level as parameter instead of the reticulation number. ...
Journal article (2025) - Katharina T. Huber, Leo van Iersel, Mark Jones, Vincent Moulton, Leonie Veenema - Nipius
Phylogenetic networks are graphs that are used to represent evolutionary relationships between different taxa. They generalize phylogenetic trees since for example, unlike trees, they permit lineages to combine. Recently, there has been rising interest in semi-directed phylogenetic networks, which are mixed graphs in which certain lineage combination events are represented by directed edges coming together, whereas the remaining edges are left undirected. One reason to consider such networks is that it can be difficult to root a network using real data. In this paper, we consider the problem of when a semi-directed phylogenetic network is defined or encoded by the smaller networks that it induces on the 4-leaf subsets of its leaf set. These smaller networks are called quarnets. We prove that semi-directed binary level-2 phylogenetic networks are encoded by their quarnets, but that this is not the case for level-3. In addition, we prove that the so-called blob tree of a semi-directed binary network, a tree that gives the coarse-grained structure of the network, is always encoded by the quarnets of the network. These results are relevant for proving the statistical consistency of programs that are currently being developed for reconstructing phylogenetic networks from practical data, such as the recently developed Squirrel software tool. ...
Journal article (2025) - Niels Holtgrefe, Elizabeth S. Allman, Hector Baños, Leo van Iersel, Vincent Moulton, John A. Rhodes, Kristina Wicke
The inference of phylogenetic networks, which model complex evolutionary processes including hybridization and gene flow, remains a central challenge in evolutionary biology. Until now, statistically consistent inference methods have been limited to phylogenetic level-1 networks, which allow no interdependence between reticulate events. In this work, we establish the theoretical foundations for a statistically consistent inference method for a much broader class: semi-directed level-2 networks that are outer-labeled planar and galled. We precisely characterize the features of these networks that are distinguishable from the topologies of their displayed quartet trees. Moreover, we prove that an inter-taxon distance derived from these quartets is circular decomposable, enabling future robust inference of these networks from quartet data, such as concordance factors obtained from gene tree distributions under the Network Multispecies Coalescent model. Our results also have novel identifiability implications across different data types and evolutionary models, applying to any setting in which displayed quartets can be distinguished. ...
Phylogenetic diversity plays an important role in biodiversity, conservation, and evolutionary studies by measuring the diversity of a set of taxa based on their phylogenetic relationships. In phylogenetic trees, a subset of k taxa with maximum phylogenetic diversity can be found by a simple and efficient greedy algorithm. However, this algorithmic tractability is lost when considering phylogenetic networks, which incorporate reticulate evolutionary events such as hybridization and horizontal gene transfer. To address this challenge, we introduce PaNDA (Phylogenetic Network Diversity Algorithms), the first software package and interactive graphical user-interface for exploring, visualizing and maximizing diversity in phylogenetic networks. PaNDA includes a novel algorithm to find a subset of k taxa with maximum diversity, running in polynomial time for networks of bounded scanwidth, a measure of tree-likeness of a network that grows slower than the well-known level measure. This algorithm considers the variant of phylogenetic diversity on networks in which the branch lengths of all paths from the root to the selected taxa contribute towards their diversity. We demonstrate the scalability of this algorithm on simulated networks, successfully analyzing level-15 networks with up to 200 taxa in seconds. We also provide a proof-of-concept analysis using a phylogenetic network on Xiphophorus species, illustrating how the tool can support diversity studies based on real genomic data. The software is easily installable and freely available at https://github.com/nholtgrefe/panda. Additionally, we extend the definition of phylogenetic diversity to semi-directed phylogenetic networks, which are mixed graphs increasingly used in phylogenetic analysis to model uncertainty of the root location. We prove that finding a subset of k taxa with maximum diversity remains NP-hard on semi-directed networks, but do present a polynomial-time algorithm for networks with bounded level. ...
Journal article (2025) - Niels Holtgrefe, Katharina T Huber, Leo van Iersel, Mark Jones, Samuel Martin, Vincent Moulton
With the increasing availability of genomic data, biologists aim to find more accurate descriptions of evolutionary histories influenced by secondary contact, where diverging lineages reconnect before diverging again. Such reticulate evolutionary events can be more accurately represented in phylogenetic networks than in phylogenetic trees. Since the root location of phylogenetic networks cannot be inferred from biological data under several evolutionary models, we consider semi-directed (phylogenetic) networks: partially directed graphs without a root in which the directed edges represent reticulate evolutionary events. By specifying a known outgroup, the rooted topology can be recovered from such networks. We introduce the algorithm Squirrel (Semi-directed Quarnet-based Inference to Reconstruct Level-1 Networks) which constructs a semi-directed level-1 network from a full set of quarnets (four-leaf semi-directed networks). Our method also includes a heuristic to construct such a quarnet set directly from sequence alignments. We demonstrate Squirrel’s performance through simulations and on real sequence data sets, the largest of which contains 29 aligned sequences close to 1.7 Mb long. The resulting networks are obtained on a standard laptop within a few minutes. Lastly, we prove that Squirrel is combinatorially consistent: given a full set of quarnets coming from a triangle-free semi-directed level-1 network, it is guaranteed to reconstruct the original network. Squirrel is implemented in Python, has an easy-to-use graphical user interface that takes sequence alignments or quarnets as input, and is freely available [...] ...
Journal article (2024) - Leo van Iersel, Mark Jones, Mathias Weller
How many reticulations are needed for a phylogenetic network to display a given set of k phylogenetic trees on n leaves? For k = 2, Baroni et al. [Ann. Comb. 8, 391-408 (2005)] showed that the answer is n − 2. Here, we show that, for k ≥ 3 the answer is at least (3 /2 − ε)n. Concretely, we prove that, for each ε > 0, there is some n ∈ N such that three n-leaf caterpillar trees can be constructed in such a way that any network displaying these caterpillars contains at least (3 /2 − ε)n reticulations. The case of three trees is interesting since it is the easiest case that cannot be equivalently formu-lated in terms of agreement forests. Instead, we base the result on a surprising lower bound for multilabelled trees (MUL-trees) displaying the caterpillars. Indeed, we show that one cannot do (more than an ε) better than the trivial MUL-tree resulting from a simple concatenation of the given caterpillars. The results are relevant for the development of methods for the Hybridization Number problem on more than two trees. This fundamental problem asks to construct a binary phylogenetic network with a minimum number of reticulations displaying a given set of phylogenetic trees. ...
Journal article (2024) - Giulia Bernardini, Leo van Iersel, Esther Julien, Leen Stougie
The Hybridization problem asks to reconcile a set of conflicting phylogenetic trees into a single phylogenetic network with the smallest possible number of reticulation nodes. This problem is computationally hard and previous solutions are limited to small and/or severely restricted data sets, for example, a set of binary trees with the same taxon set or only two non-binary trees with non-equal taxon sets. Building on our previous work on binary trees, we present FHyNCH, the first algorithmic framework to heuristically solve the Hybridization problem for large sets of multifurcating trees whose sets of taxa may differ. Our heuristics combine the cherry-picking technique, recently proposed to solve the same problem for binary trees, with two carefully designed machine-learning models. We demonstrate that our methods are practical and produce qualitatively good solutions through experiments on both synthetic and real data sets. ...
Journal article (2024) - Katharina T. Huber, Leo van Iersel, Remie Janssen, Mark Jones, Vincent Moulton, Yukihiro Murakami, Charles Semple
This paper studies the relationship between undirected (unrooted) and directed (rooted) phylogenetic networks. We describe a polynomial-time algorithm for deciding whether an undirected nonbinary phylogenetic network, given the locations of the root and reticulation vertices, can be oriented as a directed nonbinary phylogenetic network. Moreover, we characterize when this is possible and show that, in such instances, the resulting directed nonbinary phylogenetic network is unique. In addition, without being given the location of the root and the reticulation vertices, we describe an algorithm for deciding whether an undirected binary phylogenetic network N can be oriented as a directed binary phylogenetic network of a certain class. The algorithm is fixed-parameter tractable (FPT) when the parameter is the level of N and is applicable to classes of directed phylogenetic networks that satisfy certain conditions. As an example, we show that the well-studied class of binary tree-child networks satisfies these conditions. ...
Journal article (2024) - Elise Deen, Leo van Iersel, Remie Janssen, Mark Jones, Yukihiro Murakami, Norbert Zeh
The maximum parsimony distance dMP(T1,T2) and the bounded-state maximum parsimony distance dMPt(T1,T2) measure the difference between two phylogenetic trees T1,T2 in terms of the maximum difference between their parsimony scores for any character (with t a bound on the number of states in the character, in the case of dMPt(T1,T2)). While computing dMP(T1,T2) was previously shown to be fixed-parameter tractable with a linear kernel, no such result was known for dMPt(T1,T2). In this paper, we prove that computing dMPt(T1,T2) is fixed-parameter tractable for all t. Specifically, we prove that this problem has a kernel of size O(klg⁡k), where k=dMPt(T1,T2). As the primary analysis tool, we introduce the concept of leg-disjoint incompatible quartets, which may be of independent interest. ...
Journal article (2023) - Leo van Iersel, Vincent Moulton, Yukihiro Murakami
Graph invariants are a useful tool in graph theory. Not only do they encode useful information about the graphs to which they are associated, but complete invariants can be used to distinguish between non-isomorphic graphs. Polynomial invariants for graphs such as the well-known Tutte polynomial have been studied for several years, and recently there has been interest to also define such invariants for phylogenetic networks, a special type of graph that arises in the area of evolutionary biology. Recently Liu gave a complete invariant for (phylogenetic) trees. However, the polynomial invariants defined thus far for phylogenetic networks that are not trees require vertex labels and either contain a large number of variables, or they have exponentially many terms in the number of reticulations. This can make it difficult to compute these polynomials and to use them to analyse unlabelled networks. In this paper, we shall show how to circumvent some of these difficulties for rooted cactuses and cactuses. As well as being important in other areas such as operations research, rooted cactuses contain some common classes of phylogenetic networks such phylogenetic trees and level-1 networks. More specifically, we define a polynomial F that is a complete invariant for the class of rooted cactuses without vertices of indegree 1 and outdegree 1 that has 5 variables, and a polynomial Q that is a complete invariant for the class of rooted cactuses that has 6 variables whose degree can be bounded linearly in terms of the size of the rooted cactus. We also explain how to extend the Q polynomial to define a complete invariant for leaf-labelled rooted cactuses as well as (unrooted) cactuses. ...