Print Email Facebook Twitter Mining File Histories Title Mining File Histories: Should we consider branches? Author Kovalenko, V.V. (TU Delft Software Engineering) Palomba, F. (University of Zürich) Bacchelli, A. (University of Zürich) Date 2018 Abstract Modern distributed version control systems, such as Git, offer support for branching — the possibility to develop parts of software outside the master trunk. Consideration of the repository structure in Mining Software Repository (MSR) studies requires a thorough approach to mining, but there is no well-documented, widespread methodology regarding the handling of merge commits and branches. Moreover, there is still a lack of knowledge of the extent to which considering branches during MSR studies impacts the results of the studies. In this study, we set out to evaluate the importance of proper handling of branches when calculating file modification histories. We analyze over 1,400 Git repositories of four open source ecosystems and compute modification histories for over two million files, using two different algorithms. One algorithm only follows the first parent of each commit when traversing the repository, the other returns the full modification history of a file across all branches. We show that the two algorithms consistently deliver different results, but the scale of the difference varies across projects and ecosystems. Further, we evaluate the importance of accurate mining of file histories by comparing the performance of common techniques that rely on file modification history — reviewer recommendation, change recommendation, and defect prediction — for two algorithms of file history retrieval. We find that considering full file histories leads to an increase in the techniques’ performance that is rather modest. Subject Version Control SystemsBranchesMining Software Repositories To reference this document use: http://resolver.tudelft.nl/uuid:9ce35781-144f-4b20-9803-494584e0da29 DOI https://doi.org/10.1145/3238147.3238169 Publisher Association for Computing Machinery (ACM), New York, NY ISBN 978-1-4503-5937-5 Source ASE 2018: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering Event ASE 2018, 2018-07-03 → 2018-07-07, Montpellier, France Part of collection Institutional Repository Document type conference paper Rights © 2018 V.V. Kovalenko, F. Palomba, A. Bacchelli Files PDF git2neo.pdf 915.21 KB Close viewer /islandora/object/uuid:9ce35781-144f-4b20-9803-494584e0da29/datastream/OBJ/view