Algebraic Invariants for Inferring 4-Leaf Semi-Directed Phylogenetic Networks

Journal Article (2025)
Author(s)

Samuel Martin (European Bioinformatics Institute, Earlham Institute)

N.A.L. Holtgrefe (TU Delft - Discrete Mathematics and Optimization)

Vincent Moulton (University of East Anglia)

Richard M Leggett (Earlham Institute)

Research Group
Discrete Mathematics and Optimization
DOI related publication
https://doi.org/10.1093/sysbio/syaf071
More Info
expand_more
Publication Year
2025
Language
English
Research Group
Discrete Mathematics and Optimization
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

A core goal of phylogenomics is to determine the evolutionary history of a set of species from biological sequence data. Phylogenetic networks are able to describe more complex evolutionary phenomena than phylogenetic trees but are more difficult to accurately reconstruct. Recently, there has been growing interest in developing methods to infer semi-directed phylogenetic networks. As computing such networks can be computationally intensive, one approach to building such networks is to puzzle together smaller networks. Thus, it is essential to have robust methods for inferring semi-directed phylogenetic networks on small numbers of taxa. In this paper, we investigate an algebraic method for performing phylogenetic network inference from nucleotide sequence data on 4-leaf semi-directed phylogenetic networks by analyzing the distribution of leaf-pattern probabilities. On simulated data, we found that we can correctly identify with high accuracy the undirected phylogenetic network for sequences of length at least 10 kbp. We found that identifying the semi-directed network is more challenging and requires sequences of length approaching 10 Mbp. We are also able to use our approach to identify treelike evolution and determine the underlying tree. Finally, we employ our method on a real data set from Xiphophorus species and use the results to build a phylogenetic network.