Efficiently Optimizing Hyperparameters for the Gumtree Hybrid Code Differencing Algorithm within HyperAST
A.F. Nitters (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Quentin Le Dilavrec – Mentor (TU Delft - Software Engineering)
C.E. Brandt – Mentor (TU Delft - Software Engineering)
Jesper Cockx – Graduation committee member (TU Delft - Programming Languages)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Code differencing allows understanding changes between different versions of software, especially when using Abstract Syntax Trees (ASTs) to structurally represent code. The Gumtree algorithm is the current state-of-the-art algorithm for AST differencing, however its main drawback is long runtimes, especially for large ASTs. The Simple and Hybrid variants aims to solve this issue with a heuristic-based recovery phase, but their evaluation has been limited to small ASTs. Benchmarks show that the Simple variant results in significant improvements in output quality and performance with very large ASTs, while the Hybrid variant has mixed results. Furthermore, increasing the max_size hyperparameter with the Hybrid variant has an unpredictable effect on output quality and a negative effect on performance, but min_priority can generally be set to 1. The Diff-Auto-Tuning optimization approach is deemed often unusable for large ASTs, though it can be used to find cases where the Hybrid variant outperforms the Simple variant on a single data point.