Computing the Rooted Triplet Distance for Phylogenetic Trees and Caterpillars
L. van Eeuwijk (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Yukihiro Murakami – Mentor (TU Delft - Discrete Mathematics and Optimization)
B.J. Meulenbroek – Graduation committee member (TU Delft - Mathematical Physics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
In phylogenetics, it is important to figure out what method to obtain a phylogenetic tree, a tree showing how species evolve from one another, is more reliable. A phylogenetic tree is a mathematical tree, where every internal vertex has at least 2 children, and the leaves are labeled bijectively with a set $X$. One useful tool to help determine what method is more reliable is to find the rooted triplet distance between two trees. If the distance from a newly developed tree is far from what we know to be reliable trees, then this method is probably unreliable. For this distance, one should look at how many triplets the trees do not have in common. In this report, we will look at a specific algorithm to compute the rooted triplet distance between two special rooted phylogenetic trees, caterpillars, by checking how many triplets these two caterpillars have in common. In this report, we will map the leaves into $\mathbb{N}^3$ and $\mathbb{N}^d$, to determine how many triplets 3 or $d$ trees have in common. Especially for $\mathbb{N}^d$ it becomes complicated to decide for what regions of $\mathbb{N}^d$ the leaves need to be counted, and for which ones not. After finding the closed-form notation for this, we will also discuss how many terms there are in the sum of this equation. Lastly, we will talk about the time complexity of this algorithm. The idea is that this algorithm will work faster than the naive approach, which runs in $O(n^3)$ time. Due to complications in the third and higher dimensions, this faster running time may or may not be accomplished, depending on whether another algorithm can be found to resolve these complications, which will not be done in this report. A closed-form notation can be found to express the number of triplets two caterpillars do not have in common, and the number of terms this equation has will also be discussed in Chapter 5.