Tree-child Network Containment

More Info
expand_more

Abstract

Interest in phylogenetic trees for histories of species and DNA has spawned many problems, one of which is TreeContainment; a problem that asks whether a tree is contained within a network. The TreeContainment problem is proven to be NP-hard for general trees and networks, however it is solvable in polynomial time for networks that meet the tree-child restriction. An algorithm to solve TreeContainment for binary tree-child networks has been created previously with quadratic running time (van Iersel, Semple, Steel, 2010). Janssen and Murakami have recently created a new algorithm that solves a larger problem NetworkContainment, for semi-binary tree-child networks (Janssen, Murakami, 2019). This new algorithm uses tree-child sequences introduced by Linz and Semple, but there has not been an implementation of it until now. In this paper I show an implementation (using Python) of this algorithm, in which I have made a modification that increases its speed on networks with large indegrees. Furthermore I have proven in this paper that the output of this algorithm remains correct under this modification, and that the running time of the modified algorithm is now linear without requiring a constant maximum indegree at all.

Files