A. Bacchelli
Please Note
47 records found
1
Energy Patterns for Web
An Exploratory Study
As the energy footprint generated by software is increasing at an alarming rate, understanding how to develop energy-efficient applications has become a necessity. Previous work has introduced catalogs of coding practices, also known as energy patterns. These patterns are yet limited to Mobile or third-party libraries. In this study, we focus on the Web domain-a main source of energy consumption. First we investigated whether and how Mobile energy patterns can be ported to this domain and found that 20 patterns could be ported. Then, we interviewed six expert web developers from different companies to challenge the ported patterns. Most developers expressed concerns for antipatterns, specifically with functional antipatterns, and were able to formulate guidelines to locate these patterns in the source code. Finally, to quantify the effect of Web energy patterns on energy consumption, we set up an automated pipeline to evaluate two ported patterns: 'Dynamic Retry Delay' (DRD) and 'Open Only When Necessary' (OOWN). With this, we found no evidence that the DRD pattern consumes less energy than its antipattern, while the opposite is true for OOWN. Data and Material: https://doi.org/10.5281/zenodo.8404487.
On the performance of method-level bug prediction
A negative result
Bug prediction is aimed at identifying software artifacts that are more likely to be defective in the future. Most approaches defined so far target the prediction of bugs at class/file level. Nevertheless, past research has provided evidence that this granularity is too coarse-grained for its use in practice. As a consequence, researchers have started proposing defect prediction models targeting a finer granularity (particularly method-level granularity), providing promising evidence that it is possible to operate at this level. Particularly, models mixing product and process metrics provided the best results. We present a study in which we first replicate previous research on method-level bug-prediction, by using different systems and timespans. Afterwards, based on the limitations of existing research, we (1) re-evaluate method-level bug prediction models more realistically and (2) analyze whether alternative features based on textual aspects, code smells, and developer-related factors can be exploited to improve method-level bug prediction abilities. Key results of our study include that (1) the performance of the previously proposed models, tested using the same strategy but on different systems/timespans, is confirmed; but, (2) when evaluated with a more practical strategy, all the models show a dramatic drop in performance, with results close to that of a random classifier. Finally, we find that (3) the contribution of alternative features within such models is limited and unable to improve the prediction capabilities significantly. As a consequence, our replication and negative results indicate that method-level bug prediction is still an open challenge.
Primers or Reminders?
The Effects of Existing Review Comments on Code Review
In contemporary code review, the comments put by reviewers on a specific code change are immediately visible to the other reviewers involved. Could this visibility prime new reviewers' attention (due to the human's proneness to availability bias), thus biasing the code review outcome In this study, we investigate this topic by conducting a controlled experiment with 85 developers who perform a code review and a psychological experiment. With the psychological experiment, we find that 70% of participants are prone to availability bias. However, when it comes to the code review, our experiment results show that participants are primed only when the existing code review comment is about a type of bug that is not normally considered; when this comment is visible, participants are more likely to find another occurrence of this type of bug. Moreover, this priming effect does not influence reviewers' likelihood of detecting other types of bugs. Our findings suggest that the current code review practice is effective because existing review comments about bugs in code changes are not negative primers, rather positive reminders for bugs that would otherwise be overlooked during code review. Data and materials: https://doi.org/10.5281/zenodo.
Aims: (1) Quantitatively measure the effects of change decomposition on the outcome of code review (in terms of number of found defects, wrongly reported issues, suggested improvements, time, and understanding); (2) Qualitatively analyze how subjects approach the review and navigate the code, building knowledge and addressing existing issues, in large vs. decomposed changes.
Method: Controlled experiment using the pull-based development model involving 28 software developers among professionals and graduate students.
Results: Change decomposition leads to fewer wrongly reported issues, influences how subjects approach and conduct the review activity (by increasing context- seeking), yet impacts neither understanding the change rationale nor the number of found defects.
Conclusions: Change decomposition reduces the noise for subsequent data analyses but also significantly supports the tasks of the developers in charge of reviewing the changes. As such, commits belonging to different concepts should be separated, adopting this as a best practice in software engineering. ...
Aims: (1) Quantitatively measure the effects of change decomposition on the outcome of code review (in terms of number of found defects, wrongly reported issues, suggested improvements, time, and understanding); (2) Qualitatively analyze how subjects approach the review and navigate the code, building knowledge and addressing existing issues, in large vs. decomposed changes.
Method: Controlled experiment using the pull-based development model involving 28 software developers among professionals and graduate students.
Results: Change decomposition leads to fewer wrongly reported issues, influences how subjects approach and conduct the review activity (by increasing context- seeking), yet impacts neither understanding the change rationale nor the number of found defects.
Conclusions: Change decomposition reduces the noise for subsequent data analyses but also significantly supports the tasks of the developers in charge of reviewing the changes. As such, commits belonging to different concepts should be separated, adopting this as a best practice in software engineering.
PathMiner
A library for mining of path-based representations of code
One recent, significant advance in modeling source code for machine learning algorithms has been the introduction of path-based representation - an approach consisting in representing a snippet of code as a collection of paths from its syntax tree. Such representation efficiently captures the structure of code, which, in turn, carries its semantics and other information. Building the path-based representation involves parsing the code and extracting the paths from its syntax tree; these steps build up to a substantial technical job. With no common reusable toolkit existing for this task, the burden of mining diverts the focus of researchers from the essential work and hinders newcomers in the field of machine learning on code. In this paper, we present PathMiner - an open-source library for mining path-based representations of code. PathMiner is fast, flexible, well-tested, and easily extensible to support input code in any common programming language. Preprint [https://doi.org/10.5281/zenodo.2595271]; released tool [https://doi.org/10.5281/zenodo.2595257].
Healthcare Android apps
A tale of the customers’ perspective
In doing so, we define a manual process that enables the creation of an extended taxonomy of healthcare users’ requests. The results of our study show that users of healthcare apps are more likely to request new features and support for other hardware than users of different types of apps. Moreover, they tend to be less critical of the defects of the application and better support developers when debugging. ...
In doing so, we define a manual process that enables the creation of an extended taxonomy of healthcare users’ requests. The results of our study show that users of healthcare apps are more likely to request new features and support for other hardware than users of different types of apps. Moreover, they tend to be less critical of the defects of the application and better support developers when debugging.
To react, or not to react
Patterns of reaction to API deprecation
Application Programming Interfaces (API) provide reusable functionality to aid developers in the development process. The features provided by these APIs might change over time as the API evolves. To allow API consumers to peacefully transition from older obsolete features to new features, API producers make use of the deprecation mechanism that allows them to indicate to the consumer that a feature should no longer be used. The Java language designers noticed that no one was taking these deprecation warnings seriously and continued using outdated features. Due to this, they decided to change the implementation of this feature in Java 9. We question as to what extent this issue exists and whether the Java language designers have a case. We start by identifying the various ways in which an API consumer can react to deprecation. Following this we benchmark the frequency of the reaction patterns by creating a dataset consisting of data mined from 50 API consumers totalling 297,254 GitHub based projects and 1,322,612,567 type-checked method invocations. We see that predominantly consumers do not react to deprecation and we try to explain this behavior by surveying API consumers and by analyzing if the API’s deprecation policy has an impact on the consumers’ decision to react.
Defect prediction models focus on identifying defect-prone code elements, for example to allow practitioners to allocate testing resources on specific subsystems and to provide assistance during code reviews. While the research community has been highly active in proposing metrics and methods to predict defects on long-term periods (i.e.,at release time), a recent trend is represented by the so-called short-term defect prediction (i.e.,at commit-level). Indeed, this strategy represents an effective alternative in terms of effort required to inspect files likely affected by defects. Nevertheless, the granularity considered by such models might be still too coarse. Indeed, existing commit-level models highlight an entire commit as defective even in cases where only specific files actually contain defects. In this paper, we first investigate to what extent commits are partially defective; then, we propose a novel fine-grained just-in-time defect prediction model to predict the specific files, contained in a commit, that are defective. Finally, we evaluate our model in terms of (i) performance and (ii) the extent to which it decreases the effort required to diagnose a defect. Our study highlights that: (1) defective commits are frequently composed of a mixture of defective and non-defective files, (2) our fine-grained model can accurately predict defective files with an AUC-ROC up to 82% and (3) our model would allow practitioners to save inspection efforts with respect to standard just-in-time techniques.
Test-Driven Code Review
An Empirical Study
In this paper, we aim at empirically understanding whether this practice has an effect on code review effectiveness and how developers' perceive TDR. We conduct (i) a controlled experiment with 93 developers that perform more than 150 reviews, and (ii) 9 semi-structured interviews and a survey with 103 respondents to gather information on how TDR is perceived.
Key results from the experiment show that developers adopting TDR find the same proportion of defects in production code, but more in test code, at the expenses of fewer maintainability issues in production code. Furthermore, we found that most developers prefer to review production code as they deem it more critical and tests should follow from it. Moreover, general poor test code quality and no tool support hinder the adoption of TDR.
Public preprint: https://doi.org/10.5281/zenodo.2551217, data and materials: https://doi.org/10.5281/zenodo.2553139. ...
In this paper, we aim at empirically understanding whether this practice has an effect on code review effectiveness and how developers' perceive TDR. We conduct (i) a controlled experiment with 93 developers that perform more than 150 reviews, and (ii) 9 semi-structured interviews and a survey with 103 respondents to gather information on how TDR is perceived.
Key results from the experiment show that developers adopting TDR find the same proportion of defects in production code, but more in test code, at the expenses of fewer maintainability issues in production code. Furthermore, we found that most developers prefer to review production code as they deem it more critical and tests should follow from it. Moreover, general poor test code quality and no tool support hinder the adoption of TDR.
Public preprint: https://doi.org/10.5281/zenodo.2551217, data and materials: https://doi.org/10.5281/zenodo.2553139.
Selecting reviewers for code changes is a critical step for an efficient code review process. Recent studies propose automated reviewer recommendation algorithms to support developers in this task. However, the evaluation of recommendation algorithms, when done apart from their target systems and users (i.e., code review tools and change authors), leaves out important aspects: perception of recommendations, influence of recommendations on human choices, and their effect on user experience. This study is the first to evaluate a reviewer recommender in vivo. We compare historical reviewers and recommendations for over 21,000 code reviews performed with a deployed recommender in a company environment and set out to measure the influence of recommendations on users' choices, along with other performance metrics. Having found no evidence of influence, we turn to the users of the recommender. Through interviews and a survey we find that, though perceived as relevant, reviewer recommendations rarely provide additional value for the respondents. We confirm this finding with a larger study at another company. The confirmation of this finding brings up a case for more user-centric approaches to designing and evaluating the recommenders. Finally, we investigate information needs of developers during reviewer selection and discuss promising directions for the next generation of reviewer recommendation tools. Preprint: https://doi.org/10.5281/zenodo.1404814.
Why are features deprecated?
An investigation into the motivation behind deprecation
to conduct a proper code review, to better understand this process and how research and tool support can make developers become more effective and efficient reviewers.
Previous work has provided evidence that a successful code review process is one in which reviewers and authors actively participate and collaborate. In these cases, the threads of discussions that are saved by code review tools are a precious source of information that can be later exploited for research and practice. In
this paper, we focus on this source of information as a way to gather reliable data on the aforementioned reviewers’ needs. We manually analyze 900 code review comments from three large open-source projects and organize them in categories by means of a card sort. Our results highlight the presence of seven
high-level information needs, such as knowing the uses of methods and variables declared/modified in the code under review. Based on these results we suggest ways in which future code review tools can better support collaboration and the reviewing task. Preprint [https://doi.org/10.5281/zenodo.1405894]. Data and
Materials [https://doi.org/10.5281/zenodo.1405902]. ...
to conduct a proper code review, to better understand this process and how research and tool support can make developers become more effective and efficient reviewers.
Previous work has provided evidence that a successful code review process is one in which reviewers and authors actively participate and collaborate. In these cases, the threads of discussions that are saved by code review tools are a precious source of information that can be later exploited for research and practice. In
this paper, we focus on this source of information as a way to gather reliable data on the aforementioned reviewers’ needs. We manually analyze 900 code review comments from three large open-source projects and organize them in categories by means of a card sort. Our results highlight the presence of seven
high-level information needs, such as knowing the uses of methods and variables declared/modified in the code under review. Based on these results we suggest ways in which future code review tools can better support collaboration and the reviewing task. Preprint [https://doi.org/10.5281/zenodo.1405894]. Data and
Materials [https://doi.org/10.5281/zenodo.1405902].
Mock objects for testing java systems
Why and how developers use them, and how they evolve
Mining File Histories
Should we consider branches?
With our study, we investigate how often the documentation is inconsistent in a sample of 239 methods from five Python open- source software projects. Our results highlight that more than 20% of the comments are either partially defined or entirely missing and that almost 1% of the methods in the analyzed projects contain type inconsistencies. Based on these results, we create a tool, PyID, to early detect type mismatches in Python documentation and we evaluate its performance with our oracle. ...
With our study, we investigate how often the documentation is inconsistent in a sample of 239 methods from five Python open- source software projects. Our results highlight that more than 20% of the comments are either partially defined or entirely missing and that almost 1% of the methods in the analyzed projects contain type inconsistencies. Based on these results, we create a tool, PyID, to early detect type mismatches in Python documentation and we evaluate its performance with our oracle.
Code review for newcomers
Is it different?
Onboarding is a critical stage in the tenure of software developers with a project, because meaningful contribution requires familiarity with the codebase. Some software teams employ practices, such as mentoring, to help new developers get accustomed faster. Code review, i.e., the manual inspection of code changes, is an opportunity for sharing knowledge and helping with onboarding. In this study, we investigate whether and how contributions from developers with low experience in a project do receive a different treatment during code review. We compare reviewers' experience, metrics of reviewers' attention, and change merge rate between changes from newcomers and from more experienced authors in 60 active open source projects. We find that the only phenomenon that is consistent across the vast majority of projects is a lower merge rate for newcomers' changes.
...