Y. Murakami | TU Delft Repository

Metrics for classes of semi-binary phylogenetic networks using μ-representations

Journal article (2026) - Christopher Reichling, Leo van Iersel, Yukihiro Murakami

Phylogenetic networks are useful in representing the evolutionary history of taxa. In certain scenarios, one requires a way to compare different networks. In practice, this can be rather difficult, except within specific classes of networks. In this paper, we derive metrics for the class of orchard networks and the class of strongly reticulation-visible networks, from variants of so-called μ-representations, which are vector representations of networks. For both network classes, we impose degree constraints on the vertices, by considering semi-binary networks. ...

PaNDA: Efficient Optimization of Phylogenetic Diversity in Networks

Preprint (2025) - Niels Holtgrefe, Leo van Iersel, Ruben Meuwese, Yukihiro Murakami, Jannik Schestag

Phylogenetic diversity plays an important role in biodiversity, conservation, and evolutionary studies by measuring the diversity of a set of taxa based on their phylogenetic relationships. In phylogenetic trees, a subset of k taxa with maximum phylogenetic diversity can be found by a simple and efficient greedy algorithm. However, this algorithmic tractability is lost when considering phylogenetic networks, which incorporate reticulate evolutionary events such as hybridization and horizontal gene transfer. To address this challenge, we introduce PaNDA (Phylogenetic Network Diversity Algorithms), the first software package and interactive graphical user-interface for exploring, visualizing and maximizing diversity in phylogenetic networks. PaNDA includes a novel algorithm to find a subset of k taxa with maximum diversity, running in polynomial time for networks of bounded scanwidth, a measure of tree-likeness of a network that grows slower than the well-known level measure. This algorithm considers the variant of phylogenetic diversity on networks in which the branch lengths of all paths from the root to the selected taxa contribute towards their diversity. We demonstrate the scalability of this algorithm on simulated networks, successfully analyzing level-15 networks with up to 200 taxa in seconds. We also provide a proof-of-concept analysis using a phylogenetic network on Xiphophorus species, illustrating how the tool can support diversity studies based on real genomic data. The software is easily installable and freely available at https://github.com/nholtgrefe/panda. Additionally, we extend the definition of phylogenetic diversity to semi-directed phylogenetic networks, which are mixed graphs increasingly used in phylogenetic analysis to model uncertainty of the root location. We prove that finding a subset of k taxa with maximum diversity remains NP-hard on semi-directed networks, but do present a polynomial-time algorithm for networks with bounded level. ...

Phylogenetic diversity plays an important role in biodiversity, conservation, and evolutionary studies by measuring the diversity of a set of taxa based on their phylogenetic relationships. In phylogenetic trees, a subset of k taxa with maximum phylogenetic diversity can be found by a simple and efficient greedy algorithm. However, this algorithmic tractability is lost when considering phylogenetic networks, which incorporate reticulate evolutionary events such as hybridization and horizontal gene transfer. To address this challenge, we introduce PaNDA (Phylogenetic Network Diversity Algorithms), the first software package and interactive graphical user-interface for exploring, visualizing and maximizing diversity in phylogenetic networks. PaNDA includes a novel algorithm to find a subset of k taxa with maximum diversity, running in polynomial time for networks of bounded scanwidth, a measure of tree-likeness of a network that grows slower than the well-known level measure. This algorithm considers the variant of phylogenetic diversity on networks in which the branch lengths of all paths from the root to the selected taxa contribute towards their diversity. We demonstrate the scalability of this algorithm on simulated networks, successfully analyzing level-15 networks with up to 200 taxa in seconds. We also provide a proof-of-concept analysis using a phylogenetic network on Xiphophorus species, illustrating how the tool can support diversity studies based on real genomic data. The software is easily installable and freely available at https://github.com/nholtgrefe/panda. Additionally, we extend the definition of phylogenetic diversity to semi-directed phylogenetic networks, which are mixed graphs increasingly used in phylogenetic analysis to model uncertainty of the root location. We prove that finding a subset of k taxa with maximum diversity remains NP-hard on semi-directed networks, but do present a polynomial-time algorithm for networks with bounded level.

Fence Decompositions and Cherry Covers in Non-binary Phylogenetic Networks

Journal article (2025) - J. C. Pons, P. Vives Lopez, Y. Murakami, L. v. Iersel

Reticulate evolution can be modelled using phylogenetic networks. Tree-based networks, which are one of the more general classes of phylogenetic networks, have recently gained eminence for its ability to represent evolutionary histories with an underlying tree structure. To better understand tree-based networks, numerous characterizations have been proposed, based on tree embeddings, matchings, and arc partitions. Here, we build a bridge between two arc partition characterizations, namely maximal fence decompositions and cherry covers. Results on cherry covers have been found for general phylogenetic networks. We first show that the number of cherry covers is the same as the number of support trees (underlying tree structure of tree-based networks) for a given semi-binary network. Maximal fence decompositions have only been defined thus far for binary networks (constraints on vertex degrees). We remedy this by generalizing fence decompositions to non-binary networks, and using this, we characterize semi-binary tree-based networks in terms of forbidden structures. Furthermore, we give an explicit enumeration of cherry covers of semi-binary networks, by studying its fence decomposition. Finally, we prove that it is possible to characterize semi-binary tree-child networks, a subclass of tree-based networks, in terms of the number of their cherry covers. ...

A near-linear kernel for bounded-state parsimony distance

Journal article (2024) - Elise Deen, Leo van Iersel, Remie Janssen, Mark Jones, Yukihiro Murakami, Norbert Zeh

The maximum parsimony distance d_MP(T₁,T₂) and the bounded-state maximum parsimony distance d_MP^t(T₁,T₂) measure the difference between two phylogenetic trees T₁,T₂ in terms of the maximum difference between their parsimony scores for any character (with t a bound on the number of states in the character, in the case of d_MP^t(T₁,T₂)). While computing d_MP(T₁,T₂) was previously shown to be fixed-parameter tractable with a linear kernel, no such result was known for d_MP^t(T₁,T₂). In this paper, we prove that computing d_MP^t(T₁,T₂) is fixed-parameter tractable for all t. Specifically, we prove that this problem has a kernel of size O(klg⁡k), where k=d_MP^t(T₁,T₂). As the primary analysis tool, we introduce the concept of leg-disjoint incompatible quartets, which may be of independent interest. ...

Orienting undirected phylogenetic networks

Journal article (2024) - Katharina T. Huber, Leo van Iersel, Remie Janssen, Mark Jones, Vincent Moulton, Yukihiro Murakami, Charles Semple

This paper studies the relationship between undirected (unrooted) and directed (rooted) phylogenetic networks. We describe a polynomial-time algorithm for deciding whether an undirected nonbinary phylogenetic network, given the locations of the root and reticulation vertices, can be oriented as a directed nonbinary phylogenetic network. Moreover, we characterize when this is possible and show that, in such instances, the resulting directed nonbinary phylogenetic network is unique. In addition, without being given the location of the root and the reticulation vertices, we describe an algorithm for deciding whether an undirected binary phylogenetic network N can be oriented as a directed binary phylogenetic network of a certain class. The algorithm is fixed-parameter tractable (FPT) when the parameter is the level of N and is applicable to classes of directed phylogenetic networks that satisfy certain conditions. As an example, we show that the well-studied class of binary tree-child networks satisfies these conditions. ...

The Burning Number Conjecture is True for Trees without Degree-2 Vertices

Journal article (2024) - Yukihiro Murakami

Graph burning is a discrete time process which can be used to model the spread of social contagion. One is initially given a graph of unburned vertices. At each round (time step), one vertex is burned; unburned vertices with at least one burned neighbour from the previous round also becomes burned. The burning number of a graph is the fewest number of rounds required to burn the graph. It has been conjectured that for a graph on n vertices, the burning number is at most ⌈n⌉. We show that the graph burning conjecture is true for trees without degree-2 vertices. ...

Making a Network Orchard by Adding Leaves

Conference paper (2023) - Leo van Iersel, Mark Jones, Esther Julien, Yukihiro Murakami

Phylogenetic networks are used to represent the evolutionary history of species. Recently, the new class of orchard networks was introduced, which were later shown to be interpretable as trees with additional horizontal arcs. This makes the network class ideal for capturing evolutionary histories that involve horizontal gene transfers. Here, we study the minimum number of additional leaves needed to make a network orchard. We demonstrate that computing this proximity measure for a given network is NP-hard and describe a tight upper bound. We also give an equivalent measure based on vertex labellings to construct a mixed integer linear programming formulation. Our experimental results, which include both real-world and synthetic data, illustrate the efficiency of our implementation. ...

Polynomial invariants for cactuses

Journal article (2023) - Leo van Iersel, Vincent Moulton, Yukihiro Murakami

Graph invariants are a useful tool in graph theory. Not only do they encode useful information about the graphs to which they are associated, but complete invariants can be used to distinguish between non-isomorphic graphs. Polynomial invariants for graphs such as the well-known Tutte polynomial have been studied for several years, and recently there has been interest to also define such invariants for phylogenetic networks, a special type of graph that arises in the area of evolutionary biology. Recently Liu gave a complete invariant for (phylogenetic) trees. However, the polynomial invariants defined thus far for phylogenetic networks that are not trees require vertex labels and either contain a large number of variables, or they have exponentially many terms in the number of reticulations. This can make it difficult to compute these polynomials and to use them to analyse unlabelled networks. In this paper, we shall show how to circumvent some of these difficulties for rooted cactuses and cactuses. As well as being important in other areas such as operations research, rooted cactuses contain some common classes of phylogenetic networks such phylogenetic trees and level-1 networks. More specifically, we define a polynomial F that is a complete invariant for the class of rooted cactuses without vertices of indegree 1 and outdegree 1 that has 5 variables, and a polynomial Q that is a complete invariant for the class of rooted cactuses that has 6 variables whose degree can be bounded linearly in terms of the size of the rooted cactus. We also explain how to extend the Q polynomial to define a complete invariant for leaf-labelled rooted cactuses as well as (unrooted) cactuses. ...

Level-2 networks from shortest and longest distances

Journal article (2022) - Katharina T. Huber, Leo van Iersel, Remie Janssen, Mark Jones, Vincent Moulton, Yukihiro Murakami

Recently it was shown that a certain class of phylogenetic networks, called level-2 networks, cannot be reconstructed from their associated distance matrices. In this paper, we show that they can be reconstructed from their induced shortest and longest distance matrices. That is, if two level-2 networks induce the same shortest and longest distance matrices, then they must be isomorphic. We further show that level-2 networks are reconstructible from their shortest distance matrices if and only if they do not contain a subgraph from a family of graphs. A generator of a network is the graph obtained by deleting all pendant subtrees and suppressing degree-2 vertices. We also show that networks with a leaf on every generator side are reconstructible from their induced shortest distance matrix. ...

Orchard Networks are Trees with Additional Horizontal Arcs

Journal article (2022) - Leo van Iersel, Remie Janssen, Mark Jones, Yukihiro Murakami

Phylogenetic networks are used in biology to represent evolutionary histories. The class of orchard phylogenetic networks was recently introduced for their computational benefits, without any biological justification. Here, we show that orchard networks can be interpreted as trees with additional horizontal arcs. Therefore, they are closely related to tree-based networks, where the difference is that in tree-based networks the additional arcs do not need to be horizontal. Then, we use this new characterization to show that the space of orchard networks on n leaves with k reticulations is connected under the rNNI rearrangement move with diameter O(kn+ nlog (n)). ...

A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

Journal article (2022) - Leo van Iersel, Remie Janssen, Mark Jones, Yukihiro Murakami, Norbert Zeh

We present the first fixed-parameter algorithm for constructing a tree-child phylogenetic network that displays an arbitrary number of binary input trees and has the minimum number of reticulations among all such networks. The algorithm uses the recently introduced framework of cherry picking sequences and runs in O((8 k) ^kpoly (n, t)) time, where n is the number of leaves of every tree, t is the number of trees, and k is the reticulation number of the constructed network. Moreover, we provide an efficient parallel implementation of the algorithm and show that it can deal with up to 100 input trees on a standard desktop computer, thereby providing a major improvement over previous phylogenetic network construction methods. ...

A unifying characterization of tree-based networks and orchard networks using cherry covers

Journal article (2021) - Leo van Iersel, Remie Janssen, Mark Jones, Yukihiro Murakami, Norbert Zeh

Phylogenetic networks are used to represent evolutionary relationships between species in biology. Such networks are often categorized into classes by their topological features, which stem from both biological and computational motivations. We study two network classes in this paper: tree-based networks and orchard networks. Tree-based networks are those that can be obtained by inserting edges between the edges of an underlying tree. Orchard networks are a recently introduced generalization of the class of tree-child networks. Structural characterizations have already been discovered for tree-based networks; this is not the case for orchard networks. In this paper, we introduce cherry covers—a unifying characterization of both network classes—in which we decompose the edges of the networks into so-called cherry shapes and reticulated cherry shapes. We show that cherry covers can be used to characterize the class of tree-based networks as well as the class of orchard networks. Moreover, we also generalize these results to non-binary networks. ...

On cherry-picking and network containment

Journal article (2021) - Remie Janssen, Yukihiro Murakami

Phylogenetic networks are used to represent evolutionary scenarios in biology and linguistics. To find the most probable scenario, it may be necessary to compare candidate networks. In particular, one needs to distinguish different networks and determine whether one network is contained in another. In this paper, we introduce cherry-picking networks, a class of networks that can be reduced by a so-called cherry-picking sequence. We then show how to compare such networks using their sequences. We characterize reconstructible cherry-picking networks, which are the networks that are uniquely determined by the sequences that reduce them, making them distinguishable. Furthermore, we show that a cherry-picking network is contained in another cherry picking network if a sequence for the latter network reduces the former network, provided both networks can be reconstructed from their sequences in a similar way (i.e., they are in the same reconstructible class). Lastly, we show that the converse of the above statement holds for tree-child networks, thereby showing that NETWORK CONTAINMENT, the problem of checking whether a network is contained in another, can be solved by computing cherry picking sequences in linear time for tree-child networks. ...

On Phylogenetic Encodings and Orchard Networks

Doctoral thesis (2021) - Yukihiro Murakami

Phylogenetic networks are a type of graph with vertices and edges, used to elucidate the evolutionary history of species. The fundamental goal of phylogenetic research is to infer the true phylogeny of species from raw data such as DNA sequences and morphological data. Most network inference methods require one to solve an NP-hard problem; furthermore, there is generally no guarantee of a unique network. One way of resolving this is to restrict our scope to networks within a certain class and to ask the following question. What input data guarantees a unique network within this class? Such a question brings us to the idea of encodings. A network class is encoded by a certain building block, such as displayed trees, splits, or induced inter-taxa distance matrices, if the building block distinguishes one network in the class from another. More precisely, no two networks in the same class may have the same set of building blocks. Often, encoding results give inspiration for polynomial-time algorithms for inferring networks within certain classes. Assuming to have data that corresponds to a network in that class, one may plausibly construct it as the unique network that is consistent with such information. ...

Distinguishing level-1 phylogenetic networks on the basis of data generated by Markov processes

Journal article (2021) - Elizabeth Gross, Leo van Iersel, Remie Janssen, Mark Jones, Colby Long, Yukihiro Murakami

Phylogenetic networks can represent evolutionary events that cannot be described by phylogenetic trees. These networks are able to incorporate reticulate evolutionary events such as hybridization, introgression, and lateral gene transfer. Recently, network-based Markov models of DNA sequence evolution have been introduced along with model-based methods for reconstructing phylogenetic networks. For these methods to be consistent, the network parameter needs to be identifiable from data generated under the model. Here, we show that the semi-directed network parameter of a triangle-free, level-1 network model with any fixed number of reticulation vertices is generically identifiable under the Jukes–Cantor, Kimura 2-parameter, or Kimura 3-parameter constraints. ...

Recognizing and realizing cactus metrics

Journal article (2020) - Momoko Hayamizu, Katharina T. Huber, Vincent Moulton, Yukihiro Murakami

The problem of realizing finite metric spaces in terms of weighted graphs has many applications. For example, the mathematical and computational properties of metrics that can be realized by trees have been well-studied and such research has laid the foundation of the reconstruction of phylogenetic trees from evolutionary distances. However, as trees may be too restrictive to accurately represent real-world data or phenomena, it is important to understand the relationship between more general graphs and distances. In this paper, we introduce a new type of metric called a cactus metric, that is, a metric that can be realized by a cactus graph. We show that, just as with tree metrics, a cactus metric has a unique optimal realization. In addition, we describe an algorithm that can recognize whether or not a metric is a cactus metric and, if so, compute its optimal realization in O(n³) time, where n is the number of points in the space. ...

Polynomial-Time Algorithms for Phylogenetic Inference Problems Involving Duplication and Reticulation

Journal article (2020) - Leo Van Iersel, Remie Janssen, Mark Jones, Yukihiro Murakami, Norbert Zeh

A common problem in phylogenetics is to try to infer a species phylogeny from gene trees. We consider different variants of this problem. The first variant, called Unrestricted Minimal Episodes Inference, aims at inferring a species tree based on a model with speciation and duplication where duplications are clustered in duplication episodes. The goal is to minimize the number of such episodes. The second variant, Parental Hybridization, aims at inferring a species network based on a model with speciation and reticulation. The goal is to minimize the number of reticulation events. It is a variant of the well-studied Hybridization Number problem with a more generous view on which gene trees are consistent with a given species network. We show that these seemingly different problems are in fact closely related and can, surprisingly, both be solved in polynomial time, using a structure we call 'beaded trees'. However, we also show that methods based on these problems have to be used with care because the optimal species phylogenies always have a restricted form. To mitigate this problem, we introduce a new variant of Unrestricted Minimal Episodes Inference that minimizes the duplication episode depth. We prove that this new variant of the problem can also be solved in polynomial time. ...

Reconstructibility of unrooted level-k phylogenetic networks from distances

Journal article (2020) - Leo van Iersel, Vincent Moulton, Yukihiro Murakami

A phylogenetic network is a graph-theoretical tool that is used by biologists to represent the evolutionary history of a collection of species. One potential way of constructing such networks is via a distance-based approach, where one is asked to find a phylogenetic network that in some way represents a given distance matrix, which gives information on the evolutionary distances between present-day taxa. Here, we consider the following question. For which k are unrooted level-k networks uniquely determined by their distance matrices? We consider this question for shortest distances as well as for the case that the multisets of all distances is given. We prove that level-1 networks and level-2 networks are reconstructible from their shortest distances and multisets of distances, respectively. Furthermore we show that, in general, networks of level higher than 1 are not reconstructible from shortest distances and that networks of level higher than 2 are not reconstructible from their multisets of distances. ...

Combining Networks Using Cherry Picking Sequences

Conference paper (2020) - Remie Janssen, Mark Jones, Yukihiro Murakami

Phylogenetic networks are important for the study of evolution. The number of methods to find such networks is increasing, but most such methods can only reconstruct small networks. To find bigger networks, one can attempt to combine small networks. In this paper, we study the Network Hybridization problem, a problem of combining networks into another network with low complexity. We characterize this complexity via a restricted problem, Tree-child Network Hybridization, and we present an FPT algorithm to efficiently solve this restricted problem. ...

Linear Time Algorithm for Tree-Child Network Containment

Conference paper (2020) - Remie Janssen, Yukihiro Murakami

Phylogenetic networks are used to represent evolutionary scenarios in biology and linguistics. To find the most probable scenario, it may be necessary to compare candidate networks, to distinguish different networks, and to see when one network is embedded in another. Here, we consider the Network Containment problem, which asks whether a given network is contained in another network. We give a linear-time algorithm to this problem for the class of tree-child networks using the recently introduced tree-child sequences by Linz and Semple. We implement this algorithm in Python and show that the linear-time theoretical bound on the input size is achievable in practice. ...