L.J.J. van Iersel
Please Note
77 records found
1
Phylogenetic networks are useful in representing the evolutionary history of taxa. In certain scenarios, one requires a way to compare different networks. In practice, this can be rather difficult, except within specific classes of networks. In this paper, we derive metrics for the class of orchard networks and the class of strongly reticulation-visible networks, from variants of so-called μ-representations, which are vector representations of networks. For both network classes, we impose degree constraints on the vertices, by considering semi-binary networks.
In evolutionary biology, phylogenetic networks are graphs that provide a flexible framework for representing complex evolutionary histories that involve reticulate evolutionary events. Recently, phylogenetic studies have started to focus on a special class of such networks called semi-directed networks. These graphs are defined as mixed graphs that can be obtained by de-orienting some of the arcs in some rooted phylogenetic network, that is, a directed acyclic graph whose leaves correspond to a collection of species and that has a single source or root vertex. However, this definition of semi-directed networks is implicit in nature since it is not clear when a mixed-graph enjoys this property or not. In this paper, we introduce novel, explicit mathematical characterizations of semi-directed networks, and also multi-semi-directed networks, that is mixed graphs that can be obtained from directed phylogenetic networks that may have more than one root. In addition, through extending foundational tools from the theory of rooted networks into the semi-directed setting—such as cherry picking sequences, omnians, and path partitions—we characterize when a (multi-)semi-directed network can be obtained by de-orienting some rooted network that is contained in one of the well-known classes of tree-child, orchard, tree-based or forest-based networks. These results address structural aspects of (multi-)semi-directed networks and pave the way to improved theoretical and computational analyses of such networks, for example, within the development of algebraic evolutionary models that are based on such networks.
Erratum
Correction: "Distinguishing Phylogenetic Level-2 Networks with Quartets and Inter-Taxon Quartet Distances" (Bulletin of mathematical biology (2025) 87 12 DOI: 10.1007/s11538-025-01549-4.)
Phylogenetic trees and networks are graphs used to model evolutionary relationships, with trees representing strictly branching histories and networks allowing for events in which lineages merge, called reticulation events. While the question of data sufficiency has been studied extensively in the context of trees, it remains largely unexplored for networks. In this work, we take a first step in this direction by establishing bounds on the amount of genomic data required to reconstruct binary level-1 semi-directed phylogenetic networks, which are binary networks in which reticulation events are indicated by directed edges, all other edges are undirected, and cycles are vertex disjoint. For this class, methods have been developed recently that are statistically consistent. Roughly speaking, such methods are guaranteed to reconstruct the correct network assuming infinitely long genomic sequences. Here we consider the question whether networks from this class can be uniquely and correctly reconstructed from finite sequences. Specifically, we present an inference algorithm that takes as input genetic sequence data, and demonstrate that the sequence length sufficient to reconstruct the correct network with high probability, under the CFN model of evolution, scales logarithmically, polynomially, or polylogarithmically with the number of taxa, depending on the parameter regime. As part of our contribution, we also present novel inference rules for quartet data in the semi-directed phylogenetic network setting.
Maximum agreement forests have been used as a measure of dissimilarity of two or more phylogenetic trees on a given set of taxa. An agreement forest is a set of trees that can be obtained from each of the input trees by deleting edges and suppressing degree-2 vertices. A maximum agreement forest is such a forest with the minimum number of components. We present a simple 4-approximation algorithm for computing a maximum agreement forest of multiple unrooted binary trees. This algorithm applies LP rounding to an extension of a recent ILP formulation of the maximum agreement forest problem on two trees by Van Wersch et al. [13]. We achieve the same approximation ratio as the algorithm by Chen et al. [3], but our algorithm is extremely simple. We also prove that no algorithm based on the ILP formulation by Van Wersch et al. can achieve an approximation ratio of 4−ε, for any ε>0, even on two trees. To this end, we prove that the integrality gap of the ILP approaches 4 as the size of the two input trees grows.
In evolutionary biology, networks are becoming increasingly used to represent evolutionary histories for species that have undergone non-treelike or reticulate evolution. Such networks are essentially directed acyclic graphs with a leaf set that corresponds to a collection of species, and in which non-leaf vertices with indegree 1 correspond to speciation events and vertices with indegree greater than 1 correspond to reticulate events such as gene transfer. Recently forest-based networks have been introduced, which are essentially (multi-rooted) networks that can be formed by adding some arcs to a collection of phylogenetic trees (or phylogenetic forest), where each arc is added in such a way that its ends always lie in two different trees in the forest. In this paper, we consider the complexity of deciding whether a given network is proper forest-based, that is, whether it can be formed by adding arcs to some underlying phylogenetic forest which contains the same number of trees as there are roots in the network. More specifically, we show that it is NP-complete to decide whether a tree-child network with m roots is proper forest-based, for each m≥2. Moreover, for binary networks the problem remains NP-complete when m≥3 but becomes polynomial-time solvable for m=2. We also give a fixed parameter tractable (FPT) algorithm, with parameters the maximum outdegree of a vertex, the number of roots, and the number of indegree 2 vertices, for deciding if a semi-binary network is proper forest-based. A key element in proving our results is a new characterization for when a network with m roots is proper forest-based in terms of certain m-colorings.
Semi-directed networks are partially directed graphs that model evolution where the directed edges represent reticulate evolutionary events. We present an algorithm that reconstructs binary n-leaf semi-directed level-1 networks in O(n 2) time from its quarnets (4-leaf subnetworks). Our method assumes we have direct access to all quarnets, yet uses only an asymptotically optimal number of O(nlogn) quarnets. When the network is assumed to contain no triangles, our method instead relies only on four-cycle quarnets and the splits of the other quarnets. A variant of our algorithm works with quartets rather than quarnets and we show that it reconstructs most of a semi-directed level-1 network from an asymptotically optimal O(nlogn) of the quartets it displays. Additionally, we provide an O(n 3) time algorithm that reconstructs the tree-of-blobs of any binary n-leaf semi-directed network with unbounded level from O(n 3) splits of its quarnets.
Phylogenetic networks are graphs that are used to represent evolutionary relationships between different taxa. They generalize phylogenetic trees since for example, unlike trees, they permit lineages to combine. Recently, there has been rising interest in semi-directed phylogenetic networks, which are mixed graphs in which certain lineage combination events are represented by directed edges coming together, whereas the remaining edges are left undirected. One reason to consider such networks is that it can be difficult to root a network using real data. In this paper, we consider the problem of when a semi-directed phylogenetic network is defined or encoded by the smaller networks that it induces on the 4-leaf subsets of its leaf set. These smaller networks are called quarnets. We prove that semi-directed binary level-2 phylogenetic networks are encoded by their quarnets, but that this is not the case for level-3. In addition, we prove that the so-called blob tree of a semi-directed binary network, a tree that gives the coarse-grained structure of the network, is always encoded by the quarnets of the network. These results are relevant for proving the statistical consistency of programs that are currently being developed for reconstructing phylogenetic networks from practical data, such as the recently developed Squirrel software tool.
The inference of phylogenetic networks, which model complex evolutionary processes including hybridization and gene flow, remains a central challenge in evolutionary biology. Until now, statistically consistent inference methods have been limited to phylogenetic level-1 networks, which allow no interdependence between reticulate events. In this work, we establish the theoretical foundations for a statistically consistent inference method for a much broader class: semi-directed level-2 networks that are outer-labeled planar and galled. We precisely characterize the features of these networks that are distinguishable from the topologies of their displayed quartet trees. Moreover, we prove that an inter-taxon distance derived from these quartets is circular decomposable, enabling future robust inference of these networks from quartet data, such as concordance factors obtained from gene tree distributions under the Network Multispecies Coalescent model. Our results also have novel identifiability implications across different data types and evolutionary models, applying to any setting in which displayed quartets can be distinguished.
How many reticulations are needed for a phylogenetic network to display a given set of k phylogenetic trees on n leaves? For k = 2, Baroni et al. [Ann. Comb. 8, 391-408 (2005)] showed that the answer is n − 2. Here, we show that, for k ≥ 3 the answer is at least (3 /2 − ε)n. Concretely, we prove that, for each ε > 0, there is some n ∈ N such that three n-leaf caterpillar trees can be constructed in such a way that any network displaying these caterpillars contains at least (3 /2 − ε)n reticulations. The case of three trees is interesting since it is the easiest case that cannot be equivalently formu-lated in terms of agreement forests. Instead, we base the result on a surprising lower bound for multilabelled trees (MUL-trees) displaying the caterpillars. Indeed, we show that one cannot do (more than an ε) better than the trivial MUL-tree resulting from a simple concatenation of the given caterpillars. The results are relevant for the development of methods for the Hybridization Number problem on more than two trees. This fundamental problem asks to construct a binary phylogenetic network with a minimum number of reticulations displaying a given set of phylogenetic trees.
The Hybridization problem asks to reconcile a set of conflicting phylogenetic trees into a single phylogenetic network with the smallest possible number of reticulation nodes. This problem is computationally hard and previous solutions are limited to small and/or severely restricted data sets, for example, a set of binary trees with the same taxon set or only two non-binary trees with non-equal taxon sets. Building on our previous work on binary trees, we present FHyNCH, the first algorithmic framework to heuristically solve the Hybridization problem for large sets of multifurcating trees whose sets of taxa may differ. Our heuristics combine the cherry-picking technique, recently proposed to solve the same problem for binary trees, with two carefully designed machine-learning models. We demonstrate that our methods are practical and produce qualitatively good solutions through experiments on both synthetic and real data sets.
This paper studies the relationship between undirected (unrooted) and directed (rooted) phylogenetic networks. We describe a polynomial-time algorithm for deciding whether an undirected nonbinary phylogenetic network, given the locations of the root and reticulation vertices, can be oriented as a directed nonbinary phylogenetic network. Moreover, we characterize when this is possible and show that, in such instances, the resulting directed nonbinary phylogenetic network is unique. In addition, without being given the location of the root and the reticulation vertices, we describe an algorithm for deciding whether an undirected binary phylogenetic network N can be oriented as a directed binary phylogenetic network of a certain class. The algorithm is fixed-parameter tractable (FPT) when the parameter is the level of N and is applicable to classes of directed phylogenetic networks that satisfy certain conditions. As an example, we show that the well-studied class of binary tree-child networks satisfies these conditions.
The maximum parsimony distance dMP(T1,T2) and the bounded-state maximum parsimony distance dMPt(T1,T2) measure the difference between two phylogenetic trees T1,T2 in terms of the maximum difference between their parsimony scores for any character (with t a bound on the number of states in the character, in the case of dMPt(T1,T2)). While computing dMP(T1,T2) was previously shown to be fixed-parameter tractable with a linear kernel, no such result was known for dMPt(T1,T2). In this paper, we prove that computing dMPt(T1,T2) is fixed-parameter tractable for all t. Specifically, we prove that this problem has a kernel of size O(klgk), where k=dMPt(T1,T2). As the primary analysis tool, we introduce the concept of leg-disjoint incompatible quartets, which may be of independent interest.
Graph invariants are a useful tool in graph theory. Not only do they encode useful information about the graphs to which they are associated, but complete invariants can be used to distinguish between non-isomorphic graphs. Polynomial invariants for graphs such as the well-known Tutte polynomial have been studied for several years, and recently there has been interest to also define such invariants for phylogenetic networks, a special type of graph that arises in the area of evolutionary biology. Recently Liu gave a complete invariant for (phylogenetic) trees. However, the polynomial invariants defined thus far for phylogenetic networks that are not trees require vertex labels and either contain a large number of variables, or they have exponentially many terms in the number of reticulations. This can make it difficult to compute these polynomials and to use them to analyse unlabelled networks. In this paper, we shall show how to circumvent some of these difficulties for rooted cactuses and cactuses. As well as being important in other areas such as operations research, rooted cactuses contain some common classes of phylogenetic networks such phylogenetic trees and level-1 networks. More specifically, we define a polynomial F that is a complete invariant for the class of rooted cactuses without vertices of indegree 1 and outdegree 1 that has 5 variables, and a polynomial Q that is a complete invariant for the class of rooted cactuses that has 6 variables whose degree can be bounded linearly in terms of the size of the rooted cactus. We also explain how to extend the Q polynomial to define a complete invariant for leaf-labelled rooted cactuses as well as (unrooted) cactuses.