A Framework for Identifying Evolution Patterns of Open-Source Software Projects

Master Thesis (2024)
Author(s)

M. Bonfanti (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

S. Proksch – Mentor (TU Delft - Software Engineering)

A. van Deursen – Graduation committee member (TU Delft - Software Engineering)

J.G.H. Cockx – Graduation committee member (TU Delft - Programming Languages)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
04-07-2024
Awarding Institution
Delft University of Technology
Project
['IN5000 Final Project']
Programme
['Computer Science | Software Technology']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Research on open-source software evolution gained popularity in the last decade focusing on the theoretical determining factors. Additional works studied growth patterns modeling using time series techniques on small projects and metrics samples or non-openly available larger datasets. Limitations in reproducibility and scalability of these methodologies add to the lack of research on time series methodologies applied to open-source software evolution. Thus, time series approaches from different domains are needed to address the multivariate nature of larger and variable samples of open-source projects and metrics time series data. This thesis aims to provide a reproducible and scalable framework to support researchers in studying open-source software evolution using patterns modeling, time series merging, multivariate time series clustering and multivariate time series forecasting. An openly available dataset of 1328 projects is built using relevant metrics extracted from a systematic literature review. The metrics time series are segmented and clustered to obtain generalized growth patterns: Steep; Shallow; Plateau. The sequence of patterns and their correlation are used to create three project clusters, from which prediction models for all metrics are trained to perform multivariate time series forecasting. Experiment results give confidence over the reproducibility and the scalability of the framework and show how the pattern shifts can be linked to real events in projects' histories. The thesis provides an additional perspective on open-source software evolution and can serve as a starting point for further studies.

Files

License info not available