Know your cloning
Is your dataset biased in selection?
(* The repositories are downloaded from a mirror of Github of November 2016, so we may not find the newest projects in our database.)
For four popular languages, we have downloaded all of their non-forked repositories on GitHub.
The inner circle still shows the proportion of downloaded projects so that they can be judged against the outer circle, which represents the amount of downloaded files for the our four languages.
Intentional: similar functionality required in multiple contents; school assignments.
Unintentional: boilerplate for Android applications.
Autogenerated: Apache axis (web service framework); Android; Java.
Intentional: slight adaptation of algorithms for different interfaces.
Unintentional: different library versions.
Autogenerated: Qt Meta-Object Compiler.
Intentional: unitests; database schemes.
Intentional: test with only changed name.
Unintentional: different library versions (jQuery).
Autogenerated: angular.js with Yeoman; express.js; grunt.js.