Analyzing Similar Build Configurations Across Different GitHub Projects
C.M. Manoli (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Sebastian Proksch – Mentor (TU Delft - Software Engineering)
S. Huang – Mentor (TU Delft - Software Technology)
Julia Olkhovskaya – Graduation committee member (TU Delft - Interactive Intelligence)
More Info
expand_more
An additional online appendix containing the datasets extracted, the source code of the project, additional images generated, etc.
https://doi.org/10.5281/zenodo.10577178Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
GitHub is the home of hundreds of millions of Open Source Software(OSS) repositories where users collaborate on projects and find inspiration for new ideas. Some of these projects have certain build configurations set up to make building, testing, and deploying the software more time-efficient and less error-prone. However, setting up the correct configurations usually requires a lot of time and a high level of knowledge. This paper aims to analyze the current practices for setting up build configurations like the Maven files and GitHub actions while clustering some of these practices based on the scope of the project. Thus, we provide useful information in terms of discovering similar projects based on the build configurations and discuss the feasibility of build configuration analysis. In summary, we provide a comprehensive analysis of project similarity based on Maven build configurations and workflow files, shedding light on the importance of build configurations for identifying similar projects, and laying the groundwork for future exploration in the realm of build configuration analysis.