CB

C.E. Brandt

info

Please Note

16 records found

Mining Meaningful Code Review Orders From GitHub

Conference paper (2025) - A. Bouraffa, C.E. Brandt, A.E. Zaidman, W. Maalej
Developers use tools such as GitHub pull requests to review code, discuss proposed changes, and request modifications. While changed files are commonly presented in alphabetical order, this does not necessarily coincide with the reviewer's preferred navigation sequence. This study investigates the different navigation orders developers follow while commenting on changes submitted in pull requests. We mined code review comments from 23,241 pull requests in 100 popular Java and Python repositories on GitHub to analyze the order in which the reviewers commented on the submitted changes. Our analysis shows that for 44.6% of pull requests, the reviewers comment in a non-alphabetical order. Among these pull requests, we identified traces of alternative meaningful orders: 20.6% (2,134) followed a largest-diff first order, 17.6% (1,827) were commented in the order of the files' similarity to the pull request's title and description, and 29% (1,188) of pull requests containing changes to both production and test files adhered to a test-first order. We also observed that the proportion of reviewed files to total submitted files was significantly higher in non-alphabetically ordered reviews, which also received slightly fewer approvals from reviewers, on average. Our findings highlight the need for additional support during code reviews, particularly for larger pull requests, where reviewers are more likely to adopt complex strategies rather than following a single predefined order. ...

A New Predictive Problem in Software Testing

Conference paper (2025) - C.E. Brandt, Aurora Ramírez
To measure and improve the strength of test suites, software projects and their developers commonly use code coverage and aim for a threshold of around 80%. But what is the 80 % of the source code that should be covered? To prepare for the development of new, more refined code coverage criteria, we introduce a novel predictive problem in software testing: whether a code line is, or should be, covered by the test suite. In this short paper, we propose the collection of coverage information, source code metrics, and abstract syntax tree data and explore whether they are relevant to predict whether a code line is exercised by the test suite or not. We present a preliminary experiment using four machine learning (ML) algorithms and an open source Java project. We observe that ML classifiers can achieve high accuracy (up to 90%) on this novel predictive problem. We also apply an explainable method to better understand the characteristics of code lines that make them more “appealing” to be covered. Our work opens a research line worth to investigate further, where the focus of the prediction is the code to be tested. Our innovative approach contrasts with most predictive problems in software testing, which aim to predict the test case failure probability. ...

A systematic mapping study of qualitative methods

Software testing research has provided metrics on efficiency, error rates, and insights into the effectiveness of testing methodologies and tools. However, these tell only a part of the story. The qualitative dimension, which studies experiences, perceptions, and decision-making processes is crucial, but less prevalent in literature. This study aims to systematically map qualitative research in software testing to consolidate and categorize the methodologies used in qualitative testing research, highlight their importance, and identify patterns, gaps, and future directions. We conducted a systematic mapping study, identifying and analyzing 102 primary studies from 2003 to 2023. We categorized the studies according to research strategies, data collection, and data analysis methods. We identified case studies and grounded theory as the most prevalent research strategies. Researchers primarily used semi-structured interviews and thematic analysis to understand how practitioners work and gather stakeholder perspectives. The subject areas most covered by qualitative studies included software testing processes and risks, and test automation. Areas such as test oracles, and machine learning were underrepresented. We also assessed the quality of reporting and the methodological rigor, emphasizing the challenges and limitations identified during the process. Through this study, we provide a comprehensive overview of qualitative research practices in software testing, revealing trends, gaps, and methodological insights. ...
Doctoral thesis (2024) - C.E. Brandt
Developer testing has become an established practice in large software projects. The developers working on the functionality of a project also write short, automated scripts that check the behavior of their code. While the benefits of developer testing are widely accepted, writing tests is still seen as tedious and time-consuming. Researchers are working towards alleviating developer effort by automatically generating tests. One approach to do this is test amplification, which modifies existing, manually written tests to create new tests that improve the strength of the existing test suite. When trying to fully automatically enerate tests, test generation tools face the relevance problem and the oracle problem: Which behavior of the system is worth testing and what is the expected output to check for? The developer already needs to have an understanding of these two aspects to write the code under test. We propose to leverage this knowledge of the developer to improve the test amplification process. Conjecturing that a consciously designed interaction is the key to an effective collaboration, we propose a developer-centric approach to test amplification that uses a dedicated test exploration tool to communicate and collaborate with the developer. ...

How Developers Like Their Amplified Tests

Journal article (2024) - Carolin Brandt, Ali Khatami, Mairieli Wessel, Andy Zaidman
Test amplification makes systematic changes to existing, manually written tests to provide tests complementary to an automated test suite. We consider developer-centric test amplification, where the developer explores, judges and edits the amplified tests before adding them to their maintained test suite. However, it is as yet unclear which kind of selection and editing steps developers take before including an amplified test into the test suite. In this paper we conduct an open source contribution study, amplifying tests of open source Java projects from GitHub. We report which deficiencies we observe in the amplified tests while manually filtering and editing them to open 39 pull requests with amplified tests. We present a detailed analysis of the maintainer's feedback regarding proposed changes, requested information, and expressed judgment. Our observations provide a basis for practitioners to take an informed decision on whether to adopt developer-centric test amplification. As several of the edits we observe are based on the developer's understanding of the amplified test, we conjecture that developer-centric test amplification should invest in supporting the developer to understand the amplified tests. ...

What Working With Developers on Fuzz Tests Taught Us About Coverage Gaps

Conference paper (2024) - Carolin Brandt, Marco Castelluccio, Christian Holler, Jason Kratzer, Andy Zaidman, Alberto Bacchelli
Can fuzzers generate partial tests that developers find useful enough to complete into functional tests (e.g., by adding assertions)? To address this question, we develop a prototype within the Mozilla ecosystem and open 13 bug reports proposing partial generated tests for currently uncovered code. We found that the majority of the reactions focus on whether the targeted coverage gap is actually worth testing. To investigate further which coverage gaps developers find relevant to close, we design an automated filter to exclude irrelevant coverage gaps before generating tests. From conversations with 13 developers about whether the remaining coverage gaps are worth closing when a partially generated test is available, we learn that the filtering indeed removes clearly non-test-worthy gaps. The developers propose a variety of additional strategies to address the coverage gaps and how to make fuzz tests and reports more useful for developers. ...
Conference paper (2024) - Khalid El Haji, Carolin Brandt, Andy Zaidman
Writing unit tests is a crucial task in software development, but it is also recognized as a time-consuming and tedious task. As such, numerous test generation approaches have been proposed and investigated. However, most of these test generation tools produce tests that are typically difficult to understand. Recently, Large Language Models (LLMs) have shown promising results in generating source code and supporting software engineering tasks. As such, we investigate the usability of tests generated by GitHub Copilot, a proprietary closed-source code generation tool that uses an LLM. We evaluate GitHub Copilot's test generation abilities both within and without an existing test suite, and we study the impact of different code commenting strategies on test generations.Our investigation evaluates the usability of 290 tests generated by GitHub Copilot for 53 sampled tests from open source projects. Our findings highlight that within an existing test suite, approximately 45.28% of the tests generated by Copilot are passing tests; 54.72% of generated tests are failing, broken, or empty tests. Furthermore, if we generate tests using Copilot without an existing test suite in place, we observe that 92.45% of the tests are failing, broken, or empty tests. Additionally, we study how test method comments influence the usability of test generations. ...
Conference paper (2023) - C.E. Brandt, D. Wang, A.E. Zaidman
Test amplification generates new tests by mutating existing, developer-written tests and keeping those tests that improve the coverage of the test suite. Current amplification tools focus on starting from a specific test and propose coverage improvements all over a software project, requiring considerable effort from the software engineer to understand and evaluate the different tests when deciding whether to include a test in the maintained test suite. In this paper, we propose a novel approach that lets the developer take charge and guide the test amplification process towards a specific branch they would like to test in a control flow graph visualization. We evaluate whether simple modifications to the automatic process that incorporate the guidance make the test amplification more effective at covering targeted branches. In a user study and semi-structured interviews we compare our user-guided test amplification approach to the state-of-the-art open test amplification approach. While our participants prefer the guided approach, we uncover several trade-offs that influence which approach is the better choice, largely depending on the use case of the developer. ...
Conference paper (2022) - C.E. Brandt, A.E. Zaidman
Developer testing, the practice of software engineers programmatically checking that their own components behave as they expect, has become the norm in today's software projects. With the constantly growing size and complexity of software projects and with the rise of automated test generation tools, understanding a test case is becoming more and more important compared to writing test cases from scratch.This holds especially in the area of developer-centric test amplification, where a tool automatically generates new test cases to improve a developer-maintained test suite. To investigate how visualization can help developers understand and judge test cases, we present the TESTIMPACTGRAPH, a visualization of the call tree and coverage impact of a JUnit test case proposed for amplification. It empowers the developer to drill down into the behavior of a test case, as well as providing them a clear view on how the proposed test case contributes to the coverage of the overall test suite. In a think-aloud study we investigate which information developers seek from the TESTIMPACTGRAPH, how its features can support them in accessing this information, and observations regarding the coverage impact of test cases. We infer ten actionable recommendations on how developer tests can be visualized to help developers understand their behavior and impact. ...
Conference paper (2022) - Casper Boone, Carolin Brandt, Andy Zaidman
The most common reason for Continuous Integration (CI) builds to break is failing tests. When a build breaks, a developer often has to scroll through hundreds to thousands of log lines to find which test is failing and why. Finding the issue is a tedious process that relies on a developer's experience and increases the cost of software testing. We investigate how presenting different kinds of contextual information about CI builds in the Integrated Development Environment (IDE) impacts the time developers take to fix a broken build. Our IntelliJ plugin TESTAXIS surfaces additional information such as a unique view of the code under test that was changed leading up to the build failure. We conduct a user experiment and show that TESTAXIS helps developers fix failing tests 13.4% to 48.6% faster. The participants found the features of TESTAXIS useful and would incorporate it in their development workflow to save time. With TESTAXIS we set an important step towards removing the need to manually inspect build logs and bringing CI build results to the IDE, ultimately saving developers time. ...
Abstract (2022) - C.E. Brandt
State-of-the-art test generation strategies employ advanced analyses of the code under test and powerful optimization algorithms to generate automatic test cases for software systems. As these techniques require a large amount of computational power, they are often limited to generating tests after the code under test is already written. However, today’s broad education about the importance of software testing lets developers strive to create test cases directly with new code they are contributing. To support these developers, we want to develop an incremental just-in-time test generation tool that works in close proximity to the development of the code under test. Whenever the developer creates a new class or functionality, the tool automatically proposes a matching test case. When the developer finishes implementing a new condition, the tool automatically recommends an additional test case that tests the code which was just added. The generated test cases are closely based on the existing test cases in the project with small, incremental changes to test the new lines of code. To realize such a just-in-time test generation tool we have to tackle many challenges: Detecting the completion of a test-worthy condition, generating a fitting test case in a short time on the developer’s machine, or effectively communicating the value of the new test case to the developer. With the participants of the SMILESENG Summer School we want discuss our new idea, brainstorm on the challenges that this research opens up and identify possible approaches to tackle them. Index Terms—Software Testing, Automatic ...

The interplay between automatic generation human exploration

Journal article (2022) - Carolin Brandt, Andy Zaidman
Automatically generating test cases for software has been an active research topic for many years. While current tools can generate powerful regression or crash-reproducing test cases, these are often kept separately from the maintained test suite. In this paper, we leverage the developer’s familiarity with test cases amplified from existing, manually written developer tests. Starting from issues reported by developers in previous studies, we investigate what aspects are important to design a developer-centric test amplification approach, that provides test cases that are taken over by developers into their test suite. We conduct 16 semi-structured interviews with software developers supported by our prototypical designs of a developer-centric test amplification approach and a corresponding test exploration tool. We extend the test amplification tool DSpot, generating test cases that are easier to understand. Our IntelliJ plugin TestCube[InlineMediaObject not available: see fulltext.] empowers developers to explore amplified test cases from their familiar environment. From our interviews, we gather 52 observations that we summarize into 23 result categories and give two key recommendations on how future tool designers can make their tools better suited for developer-centric test amplification. ...
Conference paper (2021) - Nienke Nijkamp, C.E. Brandt, A.E. Zaidman
Test amplification generates new test cases that improve the coverage of an existing test suite. To convince developers to integrate these new test cases into their test suite, it is crucial to convey the behavior and the improvement in coverage that the amplified test case provides. In this paper, we present NATIC, an approach to generate names for amplified test cases based on the methods they additionally cover, compared to the existing test suite. In a survey among 16 participants with a background in Computer Science, we show that the test names generated by NATIC are valued similarly to names written by experts. According to the participants, the names generated by NATIC outperform expert-written names with respect to informing about coverage improvement, but lack in conveying a test's behavior. Finally, we discuss how a restriction to two mentioned methods per name would improve the understandability of the test names generated by NATIC. ...
Conference paper (2021) - W. Oosterbroek, C.E. Brandt, A.E. Zaidman
Test amplification generates new tests by modifying existing, manually written tests.
Up until now, this process preserves statements that were relevant for the original test case but are no longer needed for the behavior of the new test case.
These unnecessary statements impact the readability of the tests in question.
As a part of the effort to make amplified test cases more readable, we investigate dynamic slicing, taint analysis and static analysis as approaches to remove redundant statements.
We design and evaluate a static analysis approach that we implemented as part of the test amplification tool DSpot.
Our empirical evaluation on 274 amplified test cases shows that the implemented approach works well: while being rudimentary, it is able to remove a significant portion of the redundant statements in the amplified test cases.
While the removal of the statements themselves is fast, verifying that the tests still work as intended through mutation testing is still resource-intensive. ...
Build logs are textual by-products that a software build process creates, often as part of its Continuous Integration (CI) pipeline. Build logs are a paramount source of information for developers when debugging into and understanding a build failure. Recently, attempts to partly automate this time-consuming, purely manual activity have come up, such as rule- or information-retrieval-based techniques. We believe that having a common data set to compare different build log analysis techniques will advance the research area. It will ultimately increase our understanding of CI build failures. In this paper, we present logchunks, a collection of 797 annotated Travis CI build logs from 80 GitHub repositories in 29 programming languages. For each build log, logchunks contains a manually labeled log part (chunk) describing why the build failed. We externally validated the data set with the developers who caused the original build failure. The width and depth of the logchunks data set are intended to make it the default benchmark for automated build log analysis techniques. ...