Understanding evolutionary history is a cornerstone of biological research, yet complex phenomena like hybridization and horizontal gene transfer require structures more versatile than simple trees, known as phylogenetic networks. Reconstructing these networks under the maximum p
...
Understanding evolutionary history is a cornerstone of biological research, yet complex phenomena like hybridization and horizontal gene transfer require structures more versatile than simple trees, known as phylogenetic networks. Reconstructing these networks under the maximum parsimony principle leads to the Hybridization Problem, a combinatorial optimization task that is NP-hard and poses significant scalability challenges for existing exact and heuristic methods.
This thesis investigates whether learning-guided search can provide a scalable approximation framework for the Hybridization Problem in temporal phylogenetic networks. The problem is reformulated as a Markov Decision Process (MDP) based on the theory of cherry-picking sequences (CPS), which establishes a fundamental equivalence between sequence weight and the network’s reticulation number. Inspired by the AlphaZero paradigm, the proposed framework integrates Monte Carlo Tree Search (MCTS) with deep learning, employing a modular architecture featuring a Value Network designed to provide direct state evaluations in place of computationally expensive heuristic rollouts.
To handle the graph-theoretical nature of the problem, a hierarchical feature engineering pipeline was developed to summarize local leaf-level and global structural properties into a compact representation for neural guidance. The framework was evaluated through a large-scale benchmark of 460 independent simulations, comparing the novel Value-Guided MCTS (V-MCTS) against standard MCTS, randomized heuristics, and exact solvers.
The results demonstrate that the Standard MCTS achieved the highest success rate at 91.74%, while the V-MCTS identified the optimum in 79.35% of cases. Notably, the V-MCTS exhibited more conservative failure behavior, with error margins strictly bounded compared to the "catastrophic" failures of unguided methods. This work contributes a novel, learning-enhanced approach to phylogenetic reconstruction, proving that reinforcement learning-based heuristics can achieve high-precision results within realistic computational budgets and offering a promising path toward handling larger, more complex evolutionary datasets.
The experimental results also demonstrate that formalizing the temporal hybridization problem as an MDP successfully captures the underlying topological logic required for phylogenetic reductions. Notably, the Unconstrained Random Search (URS) heuristic, which utilizes a simple iterative random loop, achieved a remarkable second-place ranking with a success rate of 89.13%. This finding suggests that simpler algorithms, which had not been previously explored for this specific task, can be sufficiently robust for moderate-scale instances. This highlights a potential for algorithmic minimalism, where unconstrained methods achieve results comparable to more structured approaches, providing a clear benchmark for when complex neural guidance becomes strictly necessary.