Y. Murakami
Please Note
22 records found
1
Phylogenetic networks are useful in representing the evolutionary history of taxa. In certain scenarios, one requires a way to compare different networks. In practice, this can be rather difficult, except within specific classes of networks. In this paper, we derive metrics for the class of orchard networks and the class of strongly reticulation-visible networks, from variants of so-called μ-representations, which are vector representations of networks. For both network classes, we impose degree constraints on the vertices, by considering semi-binary networks.
The maximum parsimony distance dMP(T1,T2) and the bounded-state maximum parsimony distance dMPt(T1,T2) measure the difference between two phylogenetic trees T1,T2 in terms of the maximum difference between their parsimony scores for any character (with t a bound on the number of states in the character, in the case of dMPt(T1,T2)). While computing dMP(T1,T2) was previously shown to be fixed-parameter tractable with a linear kernel, no such result was known for dMPt(T1,T2). In this paper, we prove that computing dMPt(T1,T2) is fixed-parameter tractable for all t. Specifically, we prove that this problem has a kernel of size O(klgk), where k=dMPt(T1,T2). As the primary analysis tool, we introduce the concept of leg-disjoint incompatible quartets, which may be of independent interest.
This paper studies the relationship between undirected (unrooted) and directed (rooted) phylogenetic networks. We describe a polynomial-time algorithm for deciding whether an undirected nonbinary phylogenetic network, given the locations of the root and reticulation vertices, can be oriented as a directed nonbinary phylogenetic network. Moreover, we characterize when this is possible and show that, in such instances, the resulting directed nonbinary phylogenetic network is unique. In addition, without being given the location of the root and the reticulation vertices, we describe an algorithm for deciding whether an undirected binary phylogenetic network N can be oriented as a directed binary phylogenetic network of a certain class. The algorithm is fixed-parameter tractable (FPT) when the parameter is the level of N and is applicable to classes of directed phylogenetic networks that satisfy certain conditions. As an example, we show that the well-studied class of binary tree-child networks satisfies these conditions.
Graph burning is a discrete time process which can be used to model the spread of social contagion. One is initially given a graph of unburned vertices. At each round (time step), one vertex is burned; unburned vertices with at least one burned neighbour from the previous round also becomes burned. The burning number of a graph is the fewest number of rounds required to burn the graph. It has been conjectured that for a graph on n vertices, the burning number is at most ⌈n⌉. We show that the graph burning conjecture is true for trees without degree-2 vertices.
Phylogenetic networks are used to represent the evolutionary history of species. Recently, the new class of orchard networks was introduced, which were later shown to be interpretable as trees with additional horizontal arcs. This makes the network class ideal for capturing evolutionary histories that involve horizontal gene transfers. Here, we study the minimum number of additional leaves needed to make a network orchard. We demonstrate that computing this proximity measure for a given network is NP-hard and describe a tight upper bound. We also give an equivalent measure based on vertex labellings to construct a mixed integer linear programming formulation. Our experimental results, which include both real-world and synthetic data, illustrate the efficiency of our implementation.
Graph invariants are a useful tool in graph theory. Not only do they encode useful information about the graphs to which they are associated, but complete invariants can be used to distinguish between non-isomorphic graphs. Polynomial invariants for graphs such as the well-known Tutte polynomial have been studied for several years, and recently there has been interest to also define such invariants for phylogenetic networks, a special type of graph that arises in the area of evolutionary biology. Recently Liu gave a complete invariant for (phylogenetic) trees. However, the polynomial invariants defined thus far for phylogenetic networks that are not trees require vertex labels and either contain a large number of variables, or they have exponentially many terms in the number of reticulations. This can make it difficult to compute these polynomials and to use them to analyse unlabelled networks. In this paper, we shall show how to circumvent some of these difficulties for rooted cactuses and cactuses. As well as being important in other areas such as operations research, rooted cactuses contain some common classes of phylogenetic networks such phylogenetic trees and level-1 networks. More specifically, we define a polynomial F that is a complete invariant for the class of rooted cactuses without vertices of indegree 1 and outdegree 1 that has 5 variables, and a polynomial Q that is a complete invariant for the class of rooted cactuses that has 6 variables whose degree can be bounded linearly in terms of the size of the rooted cactus. We also explain how to extend the Q polynomial to define a complete invariant for leaf-labelled rooted cactuses as well as (unrooted) cactuses.
Phylogenetic networks are used in biology to represent evolutionary histories. The class of orchard phylogenetic networks was recently introduced for their computational benefits, without any biological justification. Here, we show that orchard networks can be interpreted as trees with additional horizontal arcs. Therefore, they are closely related to tree-based networks, where the difference is that in tree-based networks the additional arcs do not need to be horizontal. Then, we use this new characterization to show that the space of orchard networks on n leaves with k reticulations is connected under the rNNI rearrangement move with diameter O(kn+ nlog (n)).
We present the first fixed-parameter algorithm for constructing a tree-child phylogenetic network that displays an arbitrary number of binary input trees and has the minimum number of reticulations among all such networks. The algorithm uses the recently introduced framework of cherry picking sequences and runs in O((8 k) kpoly (n, t)) time, where n is the number of leaves of every tree, t is the number of trees, and k is the reticulation number of the constructed network. Moreover, we provide an efficient parallel implementation of the algorithm and show that it can deal with up to 100 input trees on a standard desktop computer, thereby providing a major improvement over previous phylogenetic network construction methods.
Phylogenetic networks are used to represent evolutionary relationships between species in biology. Such networks are often categorized into classes by their topological features, which stem from both biological and computational motivations. We study two network classes in this paper: tree-based networks and orchard networks. Tree-based networks are those that can be obtained by inserting edges between the edges of an underlying tree. Orchard networks are a recently introduced generalization of the class of tree-child networks. Structural characterizations have already been discovered for tree-based networks; this is not the case for orchard networks. In this paper, we introduce cherry covers—a unifying characterization of both network classes—in which we decompose the edges of the networks into so-called cherry shapes and reticulated cherry shapes. We show that cherry covers can be used to characterize the class of tree-based networks as well as the class of orchard networks. Moreover, we also generalize these results to non-binary networks.
Phylogenetic networks are used to represent evolutionary scenarios in biology and linguistics. To find the most probable scenario, it may be necessary to compare candidate networks. In particular, one needs to distinguish different networks and determine whether one network is contained in another. In this paper, we introduce cherry-picking networks, a class of networks that can be reduced by a so-called cherry-picking sequence. We then show how to compare such networks using their sequences. We characterize reconstructible cherry-picking networks, which are the networks that are uniquely determined by the sequences that reduce them, making them distinguishable. Furthermore, we show that a cherry-picking network is contained in another cherry picking network if a sequence for the latter network reduces the former network, provided both networks can be reconstructed from their sequences in a similar way (i.e., they are in the same reconstructible class). Lastly, we show that the converse of the above statement holds for tree-child networks, thereby showing that NETWORK CONTAINMENT, the problem of checking whether a network is contained in another, can be solved by computing cherry picking sequences in linear time for tree-child networks.
Phylogenetic networks can represent evolutionary events that cannot be described by phylogenetic trees. These networks are able to incorporate reticulate evolutionary events such as hybridization, introgression, and lateral gene transfer. Recently, network-based Markov models of DNA sequence evolution have been introduced along with model-based methods for reconstructing phylogenetic networks. For these methods to be consistent, the network parameter needs to be identifiable from data generated under the model. Here, we show that the semi-directed network parameter of a triangle-free, level-1 network model with any fixed number of reticulation vertices is generically identifiable under the Jukes–Cantor, Kimura 2-parameter, or Kimura 3-parameter constraints.
The problem of realizing finite metric spaces in terms of weighted graphs has many applications. For example, the mathematical and computational properties of metrics that can be realized by trees have been well-studied and such research has laid the foundation of the reconstruction of phylogenetic trees from evolutionary distances. However, as trees may be too restrictive to accurately represent real-world data or phenomena, it is important to understand the relationship between more general graphs and distances. In this paper, we introduce a new type of metric called a cactus metric, that is, a metric that can be realized by a cactus graph. We show that, just as with tree metrics, a cactus metric has a unique optimal realization. In addition, we describe an algorithm that can recognize whether or not a metric is a cactus metric and, if so, compute its optimal realization in O(n3) time, where n is the number of points in the space.
A common problem in phylogenetics is to try to infer a species phylogeny from gene trees. We consider different variants of this problem. The first variant, called Unrestricted Minimal Episodes Inference, aims at inferring a species tree based on a model with speciation and duplication where duplications are clustered in duplication episodes. The goal is to minimize the number of such episodes. The second variant, Parental Hybridization, aims at inferring a species network based on a model with speciation and reticulation. The goal is to minimize the number of reticulation events. It is a variant of the well-studied Hybridization Number problem with a more generous view on which gene trees are consistent with a given species network. We show that these seemingly different problems are in fact closely related and can, surprisingly, both be solved in polynomial time, using a structure we call 'beaded trees'. However, we also show that methods based on these problems have to be used with care because the optimal species phylogenies always have a restricted form. To mitigate this problem, we introduce a new variant of Unrestricted Minimal Episodes Inference that minimizes the duplication episode depth. We prove that this new variant of the problem can also be solved in polynomial time.
A phylogenetic network is a graph-theoretical tool that is used by biologists to represent the evolutionary history of a collection of species. One potential way of constructing such networks is via a distance-based approach, where one is asked to find a phylogenetic network that in some way represents a given distance matrix, which gives information on the evolutionary distances between present-day taxa. Here, we consider the following question. For which k are unrooted level-k networks uniquely determined by their distance matrices? We consider this question for shortest distances as well as for the case that the multisets of all distances is given. We prove that level-1 networks and level-2 networks are reconstructible from their shortest distances and multisets of distances, respectively. Furthermore we show that, in general, networks of level higher than 1 are not reconstructible from shortest distances and that networks of level higher than 2 are not reconstructible from their multisets of distances.
Phylogenetic networks are important for the study of evolution. The number of methods to find such networks is increasing, but most such methods can only reconstruct small networks. To find bigger networks, one can attempt to combine small networks. In this paper, we study the Network Hybridization problem, a problem of combining networks into another network with low complexity. We characterize this complexity via a restricted problem, Tree-child Network Hybridization, and we present an FPT algorithm to efficiently solve this restricted problem.
Phylogenetic networks are used to represent evolutionary scenarios in biology and linguistics. To find the most probable scenario, it may be necessary to compare candidate networks, to distinguish different networks, and to see when one network is embedded in another. Here, we consider the Network Containment problem, which asks whether a given network is contained in another network. We give a linear-time algorithm to this problem for the class of tree-child networks using the recently introduced tree-child sequences by Linz and Semple. We implement this algorithm in Python and show that the linear-time theoretical bound on the input size is achievable in practice.