As software evolves, understanding the differences between versions of code becomes more important. While text-based differencing is practical and widespread, it does not capture the structure of code. AST-based differencing solves this by using the structure of the code. Gumtree
...
As software evolves, understanding the differences between versions of code becomes more important. While text-based differencing is practical and widespread, it does not capture the structure of code. AST-based differencing solves this by using the structure of the code. Gumtree is a well known reference implementation of multiple structural diff heuristics. Gumtree Greedy is the original algorithm, while Gumtree Simple is a later version that was designed to scale better by making stronger assumptions.
In this paper, we compare ported versions of Gumtree Greedy, Gumtree Simple, and their lazified variants. They were implemented in the Rust-based HyperAST framework and tested on large-scale Java datasets. Our results show that Gumtree Simple uses significantly fewer CPU cycles compared to Gumtree Greedy. Due to suspected bugs in the implementation, we cannot yet conclusively measure the benefits of lazification. However, our implementation experience suggests that Gumtree Simple is easier to adapt and optimize for scalability.