A Mixed Methods Approach to Mining Code Review Data

Examples and a Study of Multicommit Reviews and Pull Requests

More Info
expand_more

Abstract

Software code review has been considered an important quality assurance mechanism for the last 35 years. The techniques for conducting modern code reviews have evolved along with the software industry, and have become progressively incremental and lightweight. We have studied code review in a number of contemporary settings, including Apache, Linux, KDE, Microsoft, Android, and GitHub. Code review is an inherently social activity, so we have used both quantitative and qualitative methods to understand the underlying parameters (or measures) of the process, as well as the rich interactions and motivations for doing code review. In this chapter, we describe how we have used a mixed methods approach to triangulate our findings on code review. We also describe how we use quantitative data to help us sample the most interesting cases from our data to be analyzed qualitatively. To illustrate code review research, we provide new results that contrast single-commit and multicommit reviews. We find that while multicommit reviews take longer and have more lines churned than single-commit reviews, the same number of people are involved in both types of review. To enrich and triangulate our findings, we qualitatively analyze the characteristics of multicommit reviews, and find that there are two types: reviews of branches and revisions of single commits. We also examine the reasons why commits on GitHub pull requests are rejected.