Dead Links and Lost Code: Investigating the State of Source Code Repositories in Maven Central Repository Packages

An Empirical Study

More Info
expand_more

Abstract

Maven Central serves as the de-facto repository for distributing free and open-source Java libraries and components. Evaluating its present state and overall robustness is pivotal for enabling the community to make well-informed decisions concerning its future progression. Such informed decisions would undoubtedly benefit the collective community of developers. This study aims to empirically evaluate developer practices surrounding version control and package reproducibility on Maven Central by investigating (i) the reliability of repository links, (ii) preferences regarding repository hosting services, (iii) the utilization of tags/releases, and (iv) the reproducibility of packages. Our study revealed that 20.85% of packages had unreliable repository links, attributable to inconsistencies in field usage and missing data, highlighting lax submission guidelines. GitHub emerged as the dominant host, with a market share exceeding 90% most years, though regional alternatives, like Gitee, are gaining traction. 74.35% of packages used tags/releases; however, naming convention discrepancies between Maven Central and source code repositories were identified, hindering version tracing and reproducibility. Strikingly, only a 3.06% of packages were configured to attempt reproducibility. An even smaller subset was found to be fully reproducible.