Uncovering the Secrets of the Maven Repository

None, None

Uncovering the Secrets of the Maven Repository

Analysis of Library Sizes in Maven Central

Bachelor Thesis (2023)

Author(s)

N.H.C. Tomassen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Mehdi Keshani – Mentor (TU Delft - Software Engineering)

S. Proksch – Graduation committee member (TU Delft - Software Engineering)

Soham Chakraborty – Coach (TU Delft - Programming Languages)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Data Analysis Maven Central Dependency Management

To reference this document use:

https://resolver.tudelft.nl/uuid:19f060da-211d-41fb-91c0-69a5b9e8f706

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

08-08-2023

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This research explores the size variations of artifacts in Maven Central, a repository containing a large collection of Java artifacts. This analysis sheds light on the coding habits and dependency management ecosystems within Maven Central, emphasizing the importance of managing artifact sizes effectively. It also provides valuable insights to library maintainers and clients who want to download libraries. For example, we can determine the average amount of space required to download 100 libraries.
The analysis is done by selecting a single version for each artifact in Maven Central and extracting metadata from the corresponding files.
The results reveal that the average size of an artifact is 1447 KB, although this average is heavily influenced by a few exceptionally large artifacts. Approximately 86% of the artifacts have a size smaller than 400 KB, indicating that the majority of artifacts are relatively lightweight.
The large artifacts identified in the analysis are predominantly attributed to two categories. The first category contains extensive projects with a substantial number of files, while the second category includes machine learning or big data projects that include massive data files.

Files

Tomassen_Niels_thesis.pdf

(pdf | 0.581 Mb)

License info not available