Scalable Structural Code Diffs
Comparing Gumtree Greedy and Gumtree Simple adapted for scaling
R.J. van Seventer (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Q.T. Le Dilavrec – Mentor (TU Delft - Software Engineering)
Carolin Brandt – Mentor (TU Delft - Software Engineering)
Jesper Cockx – Graduation committee member (TU Delft - Programming Languages)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
As software evolves, understanding the differences between versions of code becomes more important. While text-based differencing is practical and widespread, it does not capture the structure of code. AST-based differencing solves this by using the structure of the code. Gumtree is a well known reference implementation of multiple structural diff heuristics. Gumtree Greedy is the original algorithm, while Gumtree Simple is a later version that was designed to scale better by making stronger assumptions.
In this paper, we compare ported versions of Gumtree Greedy, Gumtree Simple, and their lazified variants. They were implemented in the Rust-based HyperAST framework and tested on large-scale Java datasets. Our results show that Gumtree Simple uses significantly fewer CPU cycles compared to Gumtree Greedy. Due to suspected bugs in the implementation, we cannot yet conclusively measure the benefits of lazification. However, our implementation experience suggests that Gumtree Simple is easier to adapt and optimize for scalability.