Exploring Code Coverage in Open-Source Development

More Info
expand_more

Abstract

Software development has increasingly become an activity that is (partially) done online on open-source platforms such as GitHub, and with it, so have the tools developers typically use. One such category of tools is that of code coverage tools. These tools track and report coverage data generated during CI tests. As the adoption of these tools has grown, so does the amount of available coverage data. In this thesis we explore a large database of coverage data from Codecov, a popular coverage tool. What sets our work apart from existing research is that it spans a large number of projects which vary in size, language, and domain. Furthermore, we conduct a survey, which was disseminated among a wide variety of open-source developers, instead of at a single company or in an enterprise setting. Our research consists of three parts. Firstly, we assess whether there is a relationship between the time to merge a PR and its coverage levels. We find that such a relationship does exist in certain projects. Secondly, we look at the impact of PR comments mentioning coverage on the odds of said coverage improving. Using the odds ratio test, we conclude that there are greater odds of coverage improving when it is mentioned than when it is not. Thirdly, we conduct a survey to ask developers their reasons for ignoring a failing status check related to code coverage. Some reasons they give are the complexity of testing, the triviality of the proposed changes, or the pull request being too important to wait for proper testing. Furthermore, respondents who identify as code contributors find themselves twice more likely to find fixing coverage a waste of their time than those who identify as code maintainers, while code maintainers are more concerned with not scaring away new contributors with strict coverage guidelines.