Scalable Structural Code Diffs

Comparing Gumtree Greedy and Gumtree Simple adapted for scaling

Bachelor Thesis (2025)
Author(s)

R.J. van Seventer (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Q.T. Le Dilavrec – Mentor (TU Delft - Software Engineering)

Carolin Brandt – Mentor (TU Delft - Software Engineering)

Jesper Cockx – Graduation committee member (TU Delft - Programming Languages)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
25-06-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

As software evolves, understanding the differences between versions of code becomes more important. While text-based differencing is practical and widespread, it does not capture the structure of code. AST-based differencing solves this by using the structure of the code. Gumtree is a well known reference implementation of multiple structural diff heuristics. Gumtree Greedy is the original algorithm, while Gumtree Simple is a later version that was designed to scale better by making stronger assumptions.

In this paper, we compare ported versions of Gumtree Greedy, Gumtree Simple, and their lazified variants. They were implemented in the Rust-based HyperAST framework and tested on large-scale Java datasets. Our results show that Gumtree Simple uses significantly fewer CPU cycles compared to Gumtree Greedy. Due to suspected bugs in the implementation, we cannot yet conclusively measure the benefits of lazification. However, our implementation experience suggests that Gumtree Simple is easier to adapt and optimize for scalability.

Files

License info not available