Mining Reproducible Dependency Updates Across Ecosystems

None, None

Mining Reproducible Dependency Updates Across Ecosystems

What changes are made to dependency update pull requests before they are accepted?

Bachelor Thesis (2026)

Author(s)

P. Khan (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

C.R. Paulsen – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

S. Proksch – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

J.A. Pouwelse – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Dataset Reproducibility Data mining Pull requests Dependency updates

To reference this document use

https://resolver.tudelft.nl/uuid:1293f2a1-192b-424e-b3f7-759803d52766

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

29-06-2026

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

7

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Dependency management is a critical, difficult-to-automate task in software engineering. Researching automated dependency management requires reliable, reproducible datasets of dependency updates across ecosystems, but existing datasets fall short: they cover only specific update types (e.g. breaking updates) and log outcomes rather than the causal factors behind pull request (PR) acceptance and build outcomes. Closing this gap would let researchers determine not just whether a dependency update PR succeeded, but why — information needed to build dependency management tools that developers can trust. As a first step toward this, we construct a categorisation model that describes the code changes within accepted dependency update PRs on a commit level, developed using an established taxonomy-development methodology, and build a regex-based tool that automates this categorisation with 86\% accuracy on hand-labelled data. Our results show that simple, deterministic techniques can reliably support transparent, automated change categorisation — a building block for future causal datasets of dependency updates.

Files

Research_paper-2.pdf

(pdf | 0.346 Mb)