Uncovering the Secrets of the Maven Repository

Analysis of Library Sizes in Maven Central

More Info
expand_more

Abstract

This research explores the size variations of artifacts in Maven Central, a repository containing a large collection of Java artifacts. This analysis sheds light on the coding habits and dependency management ecosystems within Maven Central, emphasizing the importance of managing artifact sizes effectively. It also provides valuable insights to library maintainers and clients who want to download libraries. For example, we can determine the average amount of space required to download 100 libraries.
The analysis is done by selecting a single version for each artifact in Maven Central and extracting metadata from the corresponding files.
The results reveal that the average size of an artifact is 1447 KB, although this average is heavily influenced by a few exceptionally large artifacts. Approximately 86% of the artifacts have a size smaller than 400 KB, indicating that the majority of artifacts are relatively lightweight.
The large artifacts identified in the analysis are predominantly attributed to two categories. The first category contains extensive projects with a substantial number of files, while the second category includes machine learning or big data projects that include massive data files.