Analyzing the effect of introducing time as a component in Python dependency graphs

More Info
expand_more

Abstract

The use of open-source packages is a common practice among developers. It decreases the development time and improves maintainability. But adding a dependency to a project comes with inherit risks such as introducing vulnerabilities. A few solutions that help visualize all of the dependencies of a project exist already. However, none provide the capability of selecting a moment in time for analyzing the generated structure. This research paper formalizes a time-based dependency graph that can be generalized to any ecosystem and then showcases its usefulness by analyzing the Python ecosystem throughout time. The results indicate that the Python ecosystem does have a subset of packages that are the most used - such as numpy and requests - but overall it is well balanced, meaning that it is able to withstand the removal of one of its most used packages. The data structure also provides satisfactory results, having a 89.6% accuracy when compared to the Python resolver. The findings of this study can be used to improve existing dependency networks.