Stemmatology is the study and reconstruction of textual genealogy and has several similarities to phylogenetics, the study of evolutionary histories of species. Current methods in computational stemmatology often borrow tools from phylogenetics, yet classical phylogenetic models
...
Stemmatology is the study and reconstruction of textual genealogy and has several similarities to phylogenetics, the study of evolutionary histories of species. Current methods in computational stemmatology often borrow tools from phylogenetics, yet classical phylogenetic models are not well suited to the structural and labelling requirements of manuscript traditions. In particular, phylogenetics typically assumes leaf-labelled trees or networks and lacks the means to accommodate internal labels—a feature which is crucial in stemmatology. Nonetheless, these models are frequently used as there are few formal alternatives.
This thesis addresses that gap by studying the encoding and reconstruction of internally labelled trees and networks from rooted triplets. Specifically, we consider three classes of graphs: multifurcating rooted trees, general rooted trees (allowing nodes with out-degree one), and level-1 networks. Each graph is assumed to have labels on a subset of its vertices, including all leaves and some internal nodes. We prove that, under limited assumptions, a complete set of rooted triplets uniquely determines the structure and labelling of each of these graph classes up to isomorphism. This generalises earlier results for leaf-labelled binary trees and extends triplet-based encoding to structures with internal labelling and limited reticulation.
Building on these encoding theorems, we develop polynomial-time algorithms to reconstruct each graph class from its full triplet set. For trees, our methods generalise previously proposed algorithms by allowing multifurcations, nodes with out-degree one, and labels at internal vertices. For level-1 networks, we design a reconstruction algorithm that correctly identifies cycle structures and label placement, adapting earlier techniques for dense triplet sets. All reconstruction algorithms are proven correct and theoretically efficient under the given assumptions. The algorithms have been tested on many instances and run quickly.
To compare internally labelled graphs, we introduce an extension of the classical triplet distance. This adapted metric counts differences in triplet sets between two graphs on the same label set. We evaluate the metric and reconstruction algorithms on synthetic and real-world data, demonstrating their ability to capture meaningful structural differences and to recover known graphs from complete or near-complete triplet information. These results show that rooted triplets form a robust foundation for reasoning about internally labelled structures in both tree-like and mildly reticulate settings. The theory and algorithms presented in this thesis provide new tools for computational phylogenetics and stemmatology, enabling the reconstruction and comparison of complex transmission histories from local relational constraints.
Link to the GitHub page containing all the code - https://github.com/TMALevert/LAOML