On the topology of package dependency networks: A comparison of programming language ecosystems
Transcript of On the topology of package dependency networks: A comparison of programming language ecosystems
On the Topology of Package Dependency NetworksA Comparison of Programming Language Ecosystems
Alexandre Decan, Tom Mens, Maëlick ClaesSoftware Engineering Lab
1
29 November 2016 – Int’l Workshop Software Ecosystem Architectures (WEA)
ResearchTeam
Previous Work
• A. Decan, T. Mens, M. Claes, P. Grosjean– IWSECO-WEA 2015: "On the Development and Distribution of R
Packages: An Empirical Analysis of the R Ecosystem"– SANER 2016:"When GitHub Meets CRAN: An Analysis of Inter-
Repository Package Dependency Problems”
•A. Serebrenik, T. Mens– WEA 2015: "Challenges in Software Ecosystems Research"• Generalizability• Comparing different ecosystems
3
Software Packaging Ecosystems
• Ecosystem: ”a collection of software projects which are developed and evolve together in the same environment” [Lungu]
• Software distributed as packages– Dependency relationships between
packages– Package versioning
4
Software Packaging Ecosystemsfor programming languages
• Many programming-language specificpackage managers
5
npmJavaScript
PyPIPython
RubyGemsRuby
CRANR
Software Packaging Ecosystemsfor programming languages
IEEE Spectrum ranking of most popular programming languages
6
(http://spectrum.ieee.org/image/Mjc5MjI0Ng.png)
“The real standard library people want is more like what you find in Python
or Ruby, and it’s more batteries included, feature complete, and that is not
in JavaScript. That’s in the NPM world or the larger world.”
Ecosystem comparison
7
CRAN PyPI NPM
Snapshot date 2016-04-26 2016-02-17 2016-06-28Packages 9k 56k 317k
Dependencies 21k 53k 728kNew packages in
20151.6k 17k 113k
Updates in 2015 8k 131k 711k
Data extraction
• CRAN: https://github.com/ecos-umons/extractoR• npm: https://registry.npmjs.org• PyPI: Missing dependencies information
=> https://kgullikson88.github.io/blog/pypi-analysis.html
8
Terminology
• b is a dependency of a• a is a reverse dependency of b• c is a transitive dependency of a• a is a transitive reverse dependency of c• {a, b, c, d, e, f} is a (weakly connected) component• g is an isolated package 9
Dependency usagein programming language ecosystems
PyPI has proportionally more isolated Python packages(due to its extensive standard library?)
10
“The real standard library people want is more like what you find in Python or Ruby, and it’s more batteries included, feature complete, and that is not in JavaScript. That’s in the NPM world or the larger world.”
Topologyof programming language ecosystems
The majority of packages are part of a single huge component
11
Largest component:• 76.5% (CRAN), 35.6% (PyPI), 63.8% (npm) of all packages• 91% (CRAN), 88% (PyPI), 92% (npm) of all non-isolated packages
Differences in dependenciesbetween programming language ecosystems
12
npm packages have a much higher ratio of transitive dependencies
Differences in reverse dependencies between programming language ecosystems
13
There are proportionally more very popular npm packages(i.e. higher number of transitive reverse dependencies)
Differences in reverse dependencies between programming language ecosystems
14
Number of packages required by more than 2% of the ecosystem
Possible explanationmicro-packages in npm
“In a lot of JavaScript environments, space is at a premium. [...] Several larger libraries […] have actually intentionally split themselves into sub-modules because people usually only ever load them to use a single merge function.”
Example: isarray150 direct, 77K inverse transitive deps in August 2016
var toString = {}.toString;module.exports = Array.isArray || function (arr) { return toString.call(arr) == '[object Array]’;};
15
function leftpad (str, len, ch) { str = String(str); var i = -1; if (!ch && ch !== 0) ch = ' '; len = len - str.length; while (++i < len) { str = ch + str; } return str;}
Known problems: leftpad
16
Its developer removed all his packages from npm:“This impacted many thousands of projects. [...] We began observing hundreds of failures per minute, as dependent projects – and their dependents, and their dependents... – all failed when requesting the now-unpublished package.”
http://blog.npmjs.org/post/141577284765/kik-left-pad-and-npm
function leftpad (str, len, ch) { str = String(str); var i = -1; if (!ch && ch !== 0) ch = ' '; len = len - str.length; while (++i < len) { str = ch + str; } return str;}
Known problems: leftpad
17
npm managers un-unpublished leftpad but …
“a number of dependency chains [...] explicitly requested 0.0.3.”
http://blog.npmjs.org/post/141577284765/kik-left-pad-and-npm
Conclusion
• Simple metrics can be used to compare the topology of different package-based software ecosystems
• Similarities in the dependency graph structure• Most non isolated packages are part of a large weakly
connected component• Differences that can be explained by the specificities of
each ecosystem• Python’s extensive standard library• CRAN’s particular versioning policy• npm's abundance of micro-packages
18
Future work
• See our SANER 2017 article“An empirical comparison of dependency issues in OSS packaging ecosystems”• Include RubyGems• Study the evolution over time• Frequency of package updates• Resilience of packages to failures in dependencies• Impact of solutions that rely on dependency
constraints and semantic versioning• Beyond SANER 2017: study the interplay between social
and technical aspects19
Thanks for you attention!
Questions?
20