A. Rastogi | TU Delft Repository

Pull Request Decisions Explained

An Empirical Overview

Journal article (2023) - Xunhui Zhang, Yue Yu, Gousios Georgios, Ayushi Rastogi

Context: The pull-based development model is widely used in open source projects, leading to the emergence of trends in distributed software development. One aspect that has garnered significant attention concerning pull request decisions is the identification of explanatory factors. Objective: This study builds on a decade of research on pull request decisions and provides further insights. We empirically investigate how factors influence pull request decisions and the scenarios that change the influence of such factors. Method: We identify factors influencing pull request decisions on GitHub through a systematic literature review and infer them by mining archival data. We collect a total of 3,347,937 pull requests with 95 features from 11,230 diverse projects on GitHub. Using these data, we explore the relations among the factors and build mixed effects logistic regression models to empirically explain pull request decisions. Results: Our study shows that a small number of factors explain pull request decisions, with that concerning whether the integrator is the same as or different from the submitter being the most important factor. We also note that the influence of factors on pull request decisions change with a change in context; e.g., the area hotness of pull request is important only in the early stage of project development, however it becomes unimportant for pull request decisions as projects become mature. ...

On the Shoulders of Giants: A New Dataset for Pull-based Development Research

Conference paper (2020) - Xunhui Zhang, Ayushi Rastogi, Yue Yu

Pull-based development is a widely adopted paradigm for collaboration in distributed software development, attracting eyeballs from both academic and industry. To better study pull-based development model, this paper presents a new dataset containing 96 features collected from 11,230 projects and 3,347,937 pull re- quests. We describe the creation process and explain the features in details. To the best of our knowledge, our dataset is the most comprehensive and largest one toward a complete picture for pull-based development research. ...

Questions for Data Scientists in Software Engineering: A Replication

Conference paper (2020) - Hennie Huijgens, Ayushi Rastogi, Ernst Mulders, Georgios Gousios, Arie van Deursen

In 2014, a Microsoft study investigated the sort of questions that data science applied to software engineering should answer. This resulted in 145 questions that developers considered relevant for data scientists to answer, thus providing a research agenda to the community. Fast forward to five years, no further studies investigated whether the questions from the software engineers at Microsoft hold for other software companies, including software-intensive companies with different primary focus (to which we refer as software-defined enterprises). Furthermore, it is not evident that the problems identified five years ago are still applicable, given the technological advances in software engineering. This paper presents a study at ING, a software-defined enterprise in banking in which over 15,000 IT staff provides in-house software solutions. This paper presents a comprehensive guide of questions for data scientists selected from the previous study at Microsoft along with our current work at ING. We replicated the original Microsoft study at ING, looking for questions that impact both software companies and software-defined enterprises and continue to impact software engineering. We also add new questions that emerged from differences in the context of the two companies and the five years gap in between. Our results show that software engineering questions for data scientists in the software-defined enterprise are largely similar to the software company, albeit with exceptions. We hope that the software engineering research community builds on the new list of questions to create a useful body of knowledge. ...

Releasing Fast and Slow

An Exploratory Case Study at ING

Conference paper (2019) - E. Kula, Ayushi Rastogi, Hennie Huijgens, Arie van Deursen, Georgios Gousios

The appeal of delivering new features faster has led many software projects to adopt rapid releases. However, it is not well understood what the effects of this practice are. This paper presents an exploratory case study of rapid releases at ING, a large banking company that develops software solutions in-house, to characterize rapid releases. Since 2011, ING has shifted to a rapid release model. This switch has resulted in a mixed environment of 611 teams releasing relatively fast and slow. We followed a mixed-methods approach in which we conducted a survey with 461 participants and corroborated their perceptions with 2 years of code quality data and 1 year of release delay data. Our research shows that: rapid releases are more commonly delayed than their non-rapid counterparts, however, rapid releases have shorter delays; rapid releases can be beneficial in terms of reviewing and user-perceived quality; rapidly released software tends to have a higher code churn, a higher test coverage and a lower average complexity; challenges in rapid releases are related to managing dependencies and certain code aspects, e.g. design debt. ...

The Delta Maintainability Model: Measuring Maintainability of Fine-Grained Code Changes

Conference paper (2019) - Marco di Biase, Ayushi Rastogi, Magiel Bruntink, Arie van Deursen

Existing maintainability models are used to identify technical debt of software systems. Targeting entire codebases, such models lack the ability to determine shortcomings of smaller, fine-grained changes. This paper proposes a new maintainability model – the Delta Maintainability Model (DMM) – to measure fine-grained code changes, such as commits, by adapting and extending the SIG Maintainability Model. DMM categorizes changed lines of code into low and high risk, and then uses the proportion of low risk change to calculate a delta score. The goal of the DMM is twofold: first, producing meaningful and actionable scores; second, compare and rank the maintainability of fine-grained modifications.
We report on an initial study of the model, with the goal of understanding if the adapted measurements from the SIG Maintainability Model suit the fine-grained scope of the DMM. In a manual inspection process for 100 commits, 67 cases matched the expert judgment. Furthermore, we report an exploratory empirical study on a data set of DMM scores on 3,017 issue-fixing commits of four open source and four closed source systems. Results show that the scores of DMM can be used to compare and rank commits, providing developers with a means to do root cause analysis on activities that impacted maintainability and, thus, address technical debt at a finer granularity. ...

Relationship between Geographical Location and Evaluation of Developer Contributions in GitHub

Conference paper (2018) - Ayushi Rastogi, Nachiappan Nagappan, Georgios Gousios, André van der Hoek

Background: Open source software projects show gender bias suggesting that other demographic characteristics of developers, like geographical location, can negatively influence evaluation of contributions too. Aim: This study contributes to this emerging body of knowledge in software development by presenting a quantitative analysis of the relationship between the geographical location of developers and evaluation of their contributions on GitHub. Method: We present an analysis of 70,000+ pull requests selected from 17 most actively participating countries to model the relationship between the geographical location of developers and pull request acceptance decision. Results and Conclusion: We observed structural differences in pull request acceptance rates across 17 countries. Countries with no apparent similarities such as Switzerland and Japan had one of the highest pull request acceptance rates while countries like China and Germany had one of the lowest pull request acceptance rates. Notably, higher acceptance rates were observed for all but one country when pull requests were evaluated by developers from the same country. ...