Accelerating AST-Based Code Differencing
Optimizing ChangeDistiller’s Bottom-Up Matching Strategy with HyperAST
L. Mangold (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Q.T. Le Dilavrec – Mentor (TU Delft - Software Engineering)
Carolin Brandt – Mentor (TU Delft - Software Engineering)
Jesper Cockx – Graduation committee member (TU Delft - Programming Languages)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Traditional AST-based code differencing tools like ChangeDistiller struggle to scale on large codebases. HyperAST is a framework that models versioned code as a Directed Acyclic Graph (DAG) of Abstract Syntax Trees (ASTs), with deduplication of unchanged nodes and precomputed metadata. This approach has demonstrated effectiveness in improving the performance of the GumTree algorithm. However, its applicability to algorithms with fundamentally different matching strategies, like ChangeDistiller's bottom-up approach, was unclear. We ported ChangeDistiller to Rust and adapted it to leverage HyperAST's optimizations. Experiments on 1,046 real-world code change pairs from the Defects4J dataset demonstrate a 99.13% reduction in total runtime (~4.5 hours to 2.3 minutes) and a median per-file reduction of 98.97% (3149.25 to 31.55 milliseconds), all without altering the core algorithm’s behavior. Our research demonstrates that HyperAST's techniques can be applied beyond GumTree, significantly improving ChangeDistiller's runtime performance.