C.E. Brandt
Please Note
16 records found
1
Not One to Rule Them All
Mining Meaningful Code Review Orders From GitHub
Developers use tools such as GitHub pull requests to review code, discuss proposed changes, and request modifications. While changed files are commonly presented in alphabetical order, this does not necessarily coincide with the reviewer's preferred navigation sequence. This study investigates the different navigation orders developers follow while commenting on changes submitted in pull requests. We mined code review comments from 23,241 pull requests in 100 popular Java and Python repositories on GitHub to analyze the order in which the reviewers commented on the submitted changes. Our analysis shows that for 44.6% of pull requests, the reviewers comment in a non-alphabetical order. Among these pull requests, we identified traces of alternative meaningful orders: 20.6% (2,134) followed a largest-diff first order, 17.6% (1,827) were commented in the order of the files' similarity to the pull request's title and description, and 29% (1,188) of pull requests containing changes to both production and test files adhered to a test-first order. We also observed that the proportion of reviewed files to total submitted files was significantly higher in non-alphabetically ordered reviews, which also received slightly fewer approvals from reviewers, on average. Our findings highlight the need for additional support during code reviews, particularly for larger pull requests, where reviewers are more likely to adopt complex strategies rather than following a single predefined order.
Towards Refined Code Coverage
A New Predictive Problem in Software Testing
The qualitative factor in software testing
A systematic mapping study of qualitative methods
Shaken, Not Stirred
How Developers Like Their Amplified Tests
Test amplification makes systematic changes to existing, manually written tests to provide tests complementary to an automated test suite. We consider developer-centric test amplification, where the developer explores, judges and edits the amplified tests before adding them to their maintained test suite. However, it is as yet unclear which kind of selection and editing steps developers take before including an amplified test into the test suite. In this paper we conduct an open source contribution study, amplifying tests of open source Java projects from GitHub. We report which deficiencies we observe in the amplified tests while manually filtering and editing them to open 39 pull requests with amplified tests. We present a detailed analysis of the maintainer's feedback regarding proposed changes, requested information, and expressed judgment. Our observations provide a basis for practitioners to take an informed decision on whether to adopt developer-centric test amplification. As several of the edits we observe are based on the developer's understanding of the amplified test, we conjecture that developer-centric test amplification should invest in supporting the developer to understand the amplified tests.
Mind the Gap
What Working With Developers on Fuzz Tests Taught Us About Coverage Gaps
Can fuzzers generate partial tests that developers find useful enough to complete into functional tests (e.g., by adding assertions)? To address this question, we develop a prototype within the Mozilla ecosystem and open 13 bug reports proposing partial generated tests for currently uncovered code. We found that the majority of the reactions focus on whether the targeted coverage gap is actually worth testing. To investigate further which coverage gaps developers find relevant to close, we design an automated filter to exclude irrelevant coverage gaps before generating tests. From conversations with 13 developers about whether the remaining coverage gaps are worth closing when a partially generated test is available, we learn that the filtering indeed removes clearly non-test-worthy gaps. The developers propose a variety of additional strategies to address the coverage gaps and how to make fuzz tests and reports more useful for developers.
Using GitHub Copilot for Test Generation in Python
An Empirical Study
Writing unit tests is a crucial task in software development, but it is also recognized as a time-consuming and tedious task. As such, numerous test generation approaches have been proposed and investigated. However, most of these test generation tools produce tests that are typically difficult to understand. Recently, Large Language Models (LLMs) have shown promising results in generating source code and supporting software engineering tasks. As such, we investigate the usability of tests generated by GitHub Copilot, a proprietary closed-source code generation tool that uses an LLM. We evaluate GitHub Copilot's test generation abilities both within and without an existing test suite, and we study the impact of different code commenting strategies on test generations.Our investigation evaluates the usability of 290 tests generated by GitHub Copilot for 53 sampled tests from open source projects. Our findings highlight that within an existing test suite, approximately 45.28% of the tests generated by Copilot are passing tests; 54.72% of generated tests are failing, broken, or empty tests. Furthermore, if we generate tests using Copilot without an existing test suite in place, we observe that 92.45% of the tests are failing, broken, or empty tests. Additionally, we study how test method comments influence the usability of test generations.
Developer testing, the practice of software engineers programmatically checking that their own components behave as they expect, has become the norm in today's software projects. With the constantly growing size and complexity of software projects and with the rise of automated test generation tools, understanding a test case is becoming more and more important compared to writing test cases from scratch.This holds especially in the area of developer-centric test amplification, where a tool automatically generates new test cases to improve a developer-maintained test suite. To investigate how visualization can help developers understand and judge test cases, we present the TESTIMPACTGRAPH, a visualization of the call tree and coverage impact of a JUnit test case proposed for amplification. It empowers the developer to drill down into the behavior of a test case, as well as providing them a clear view on how the proposed test case contributes to the coverage of the overall test suite. In a think-aloud study we investigate which information developers seek from the TESTIMPACTGRAPH, how its features can support them in accessing this information, and observations regarding the coverage impact of test cases. We infer ten actionable recommendations on how developer tests can be visualized to help developers understand their behavior and impact.
Developer-centric test amplification
The interplay between automatic generation human exploration
Automatically generating test cases for software has been an active research topic for many years. While current tools can generate powerful regression or crash-reproducing test cases, these are often kept separately from the maintained test suite. In this paper, we leverage the developer’s familiarity with test cases amplified from existing, manually written developer tests. Starting from issues reported by developers in previous studies, we investigate what aspects are important to design a developer-centric test amplification approach, that provides test cases that are taken over by developers into their test suite. We conduct 16 semi-structured interviews with software developers supported by our prototypical designs of a developer-centric test amplification approach and a corresponding test exploration tool. We extend the test amplification tool DSpot, generating test cases that are easier to understand. Our IntelliJ plugin TestCube[InlineMediaObject not available: see fulltext.] empowers developers to explore amplified test cases from their familiar environment. From our interviews, we gather 52 observations that we summarize into 23 result categories and give two key recommendations on how future tool designers can make their tools better suited for developer-centric test amplification.
Up until now, this process preserves statements that were relevant for the original test case but are no longer needed for the behavior of the new test case.
These unnecessary statements impact the readability of the tests in question.
As a part of the effort to make amplified test cases more readable, we investigate dynamic slicing, taint analysis and static analysis as approaches to remove redundant statements.
We design and evaluate a static analysis approach that we implemented as part of the test amplification tool DSpot.
Our empirical evaluation on 274 amplified test cases shows that the implemented approach works well: while being rudimentary, it is able to remove a significant portion of the redundant statements in the amplified test cases.
While the removal of the statements themselves is fast, verifying that the tests still work as intended through mutation testing is still resource-intensive. ...
Up until now, this process preserves statements that were relevant for the original test case but are no longer needed for the behavior of the new test case.
These unnecessary statements impact the readability of the tests in question.
As a part of the effort to make amplified test cases more readable, we investigate dynamic slicing, taint analysis and static analysis as approaches to remove redundant statements.
We design and evaluate a static analysis approach that we implemented as part of the test amplification tool DSpot.
Our empirical evaluation on 274 amplified test cases shows that the implemented approach works well: while being rudimentary, it is able to remove a significant portion of the redundant statements in the amplified test cases.
While the removal of the statements themselves is fast, verifying that the tests still work as intended through mutation testing is still resource-intensive.