Analyzing the effect of introducing time as a component in Python dependency graphs

Bachelor Thesis (2022)
Author(s)

A. Purcaru (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Diomidis Spinellis – Mentor (TU Delft - Software Engineering)

Georgios Gousios – Mentor (TU Delft - Software Technology)

Avishek Anand – Graduation committee member (TU Delft - Web Information Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2022 Andrei Purcaru
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Andrei Purcaru
Graduation Date
23-06-2022
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The use of open-source packages is a common practice among developers. It decreases the development time and improves maintainability. But adding a dependency to a project comes with inherit risks such as introducing vulnerabilities. A few solutions that help visualize all of the dependencies of a project exist already. However, none provide the capability of selecting a moment in time for analyzing the generated structure. This research paper formalizes a time-based dependency graph that can be generalized to any ecosystem and then showcases its usefulness by analyzing the Python ecosystem throughout time. The results indicate that the Python ecosystem does have a subset of packages that are the most used - such as numpy and requests - but overall it is well balanced, meaning that it is able to withstand the removal of one of its most used packages. The data structure also provides satisfactory results, having a 89.6% accuracy when compared to the Python resolver. The findings of this study can be used to improve existing dependency networks.

Files

AndreiPurcaruFinalPaper.pdf
(pdf | 1.12 Mb)
License info not available