Circular Image

A.E. Zaidman

info

Please Note

120 records found

Conference paper (2026) - R. Arntzenius , X. Liu, A. Zaidman
Continuous integration is essential for software quality, yet the energy footprint associated with its frequent execution has largely remained invisible. We provide the first comprehensive baseline of CI energy use through a largescale study of 204 open-source Java projects with repeated measurements under Maven and Gradle. Our results show that energy use is highly skewed: while most projects consume energy modestly, a small number of “CI-intensive” systems can reach annual CI energy footprints of hundreds of kilowatt-hours, which is comparable to a quarter of an average EU household's electricity use. We further show that immediate, practical savings are possible: simply enabling dependency caching cuts energy by 30 % on average in some Maven projects and by over 90% in some Gradle cases. These findings matter not only for individual developers, but also for large organizations that run thousands of builds. In those settings, even small inefficiencies can add up to very large energy costs. By exposing where energy is consumed and how to reduce it, our study establishes an actionable foundation for greener CI pipelines. ...

A socio-technical grounded theory

Journal article (2026) - Mark Swillus, Rashina Hoda, Andy Zaidman
Software testing crucial for ensuring software quality, yet developers’ engagement with it varies widely. Identifying the technical, organizational and social factors that lead to differences in engagement is required to remove barriers and utilize enablers for testing. While much research emphasizes the usefulness of software testing approaches and technical solutions, less is known about why developers do (not) test. This study investigates the first-hand experience of developers with software testing. The study illuminates how developers’ opinions about testing and their testing behavior changes. Through analysis of personal evolutions of practice, we explore when and why testing is used. Employing socio-technical grounded theory (STGT), we construct a theory by systematically analyzing data from 19 in-depth, semi-structured interviews with software developers. Allowing interviewees to reflect on how and why they approach software testing, we explore perspectives that are rooted in their contextual experiences. We develop eleven categories of circumstances that act as conditions for the application and adaptation of testing practices and introduce three concepts that we then use to present a theory of emerging testing strategies (ETS) that explains why developers do (not) use testing practices. This study reveals a new perspective on the connection between testing artifacts and collective reflection of practitioners, and it embraces testing as an experience in which human- and social aspects are entangled with organizational and technical circumstances. ...

Mining Meaningful Code Review Orders From GitHub

Conference paper (2025) - A. Bouraffa, C.E. Brandt, A.E. Zaidman, W. Maalej
Developers use tools such as GitHub pull requests to review code, discuss proposed changes, and request modifications. While changed files are commonly presented in alphabetical order, this does not necessarily coincide with the reviewer's preferred navigation sequence. This study investigates the different navigation orders developers follow while commenting on changes submitted in pull requests. We mined code review comments from 23,241 pull requests in 100 popular Java and Python repositories on GitHub to analyze the order in which the reviewers commented on the submitted changes. Our analysis shows that for 44.6% of pull requests, the reviewers comment in a non-alphabetical order. Among these pull requests, we identified traces of alternative meaningful orders: 20.6% (2,134) followed a largest-diff first order, 17.6% (1,827) were commented in the order of the files' similarity to the pull request's title and description, and 29% (1,188) of pull requests containing changes to both production and test files adhered to a test-first order. We also observed that the proportion of reviewed files to total submitted files was significantly higher in non-alphabetically ordered reviews, which also received slightly fewer approvals from reviewers, on average. Our findings highlight the need for additional support during code reviews, particularly for larger pull requests, where reviewers are more likely to adopt complex strategies rather than following a single predefined order. ...

An industrial case study in language engineering with Spoofax

Journal article (2025) - Olav Bunte, Jasper Denkers, Louis C.M. van Gool, Jurgen J. Vinju, Eelco Visser, Tim A.C. Willemse, Andy Zaidman
Domain-specific languages (DSLs) promise to improve the software engineering process, e.g., by reducing software development and maintenance effort and by improving communication, and are therefore seeing increased use in industry. To support the creation and deployment of DSLs, language workbenches have been developed. However, little is published about the actual added value of a language workbench in an industrial setting, compared to not using a language workbench. In this paper, we evaluate the productivity of using the Spoofax language workbench by comparing two implementations of an industrial DSL, one in Spoofax and one in Python, that already existed before the evaluation. The subject is the Open Interaction Language (OIL): a complex DSL for implementing control software with requirements imposed by its industrial context at Canon Production Printing. Our findings indicate that it is more productive to implement OIL using Spoofax compared to using Python, especially if editor services are desired. Although Spoofax was sufficient to implement OIL, we find that Spoofax should especially improve on practical aspects to increase its adoptability in industry. ...

A systematic mapping study of qualitative methods

Software testing research has provided metrics on efficiency, error rates, and insights into the effectiveness of testing methodologies and tools. However, these tell only a part of the story. The qualitative dimension, which studies experiences, perceptions, and decision-making processes is crucial, but less prevalent in literature. This study aims to systematically map qualitative research in software testing to consolidate and categorize the methodologies used in qualitative testing research, highlight their importance, and identify patterns, gaps, and future directions. We conducted a systematic mapping study, identifying and analyzing 102 primary studies from 2003 to 2023. We categorized the studies according to research strategies, data collection, and data analysis methods. We identified case studies and grounded theory as the most prevalent research strategies. Researchers primarily used semi-structured interviews and thematic analysis to understand how practitioners work and gather stakeholder perspectives. The subject areas most covered by qualitative studies included software testing processes and risks, and test automation. Areas such as test oracles, and machine learning were underrepresented. We also assessed the quality of reporting and the methodological rigor, emphasizing the challenges and limitations identified during the process. Through this study, we provide a comprehensive overview of qualitative research practices in software testing, revealing trends, gaps, and methodological insights. ...
Conference paper (2024) - Andy Zaidman
As we have come to rely on software systems in our daily lives, we have a clear expectation about the reliability of these systems. To ensure this reliability, automated software quality assurance processes have become an important part of software development. However, given the climate crisis that we are witnessing, it is important to ask ourselves what the impact of all these automated quality assurance processes is in terms of electricity consumption. This study explores the electricity consumption and potential environmental impact of continuous integration and software testing in 10 open source software projects. ...

How Developers Like Their Amplified Tests

Journal article (2024) - Carolin Brandt, Ali Khatami, Mairieli Wessel, Andy Zaidman
Test amplification makes systematic changes to existing, manually written tests to provide tests complementary to an automated test suite. We consider developer-centric test amplification, where the developer explores, judges and edits the amplified tests before adding them to their maintained test suite. However, it is as yet unclear which kind of selection and editing steps developers take before including an amplified test into the test suite. In this paper we conduct an open source contribution study, amplifying tests of open source Java projects from GitHub. We report which deficiencies we observe in the amplified tests while manually filtering and editing them to open 39 pull requests with amplified tests. We present a detailed analysis of the maintainer's feedback regarding proposed changes, requested information, and expressed judgment. Our observations provide a basis for practitioners to take an informed decision on whether to adopt developer-centric test amplification. As several of the edits we observe are based on the developer's understanding of the amplified test, we conjecture that developer-centric test amplification should invest in supporting the developer to understand the amplified tests. ...
Conference paper (2024) - Khalid El Haji, Carolin Brandt, Andy Zaidman
Writing unit tests is a crucial task in software development, but it is also recognized as a time-consuming and tedious task. As such, numerous test generation approaches have been proposed and investigated. However, most of these test generation tools produce tests that are typically difficult to understand. Recently, Large Language Models (LLMs) have shown promising results in generating source code and supporting software engineering tasks. As such, we investigate the usability of tests generated by GitHub Copilot, a proprietary closed-source code generation tool that uses an LLM. We evaluate GitHub Copilot's test generation abilities both within and without an existing test suite, and we study the impact of different code commenting strategies on test generations.Our investigation evaluates the usability of 290 tests generated by GitHub Copilot for 53 sampled tests from open source projects. Our findings highlight that within an existing test suite, approximately 45.28% of the tests generated by Copilot are passing tests; 54.72% of generated tests are failing, broken, or empty tests. Furthermore, if we generate tests using Copilot without an existing test suite in place, we observe that 92.45% of the tests are failing, broken, or empty tests. Additionally, we study how test method comments influence the usability of test generations. ...
Journal article (2024) - Sebastian Uchitel, Marsha Chechik, Massimiliano Di Penta, Bram Adams, Nazareno Aguirre, Gabriele Bavota, Domenico Bianculli, Kelly Blincoe, Andy Zaidman, More Authors...
Journal article (2024) - Ali Khatami, Andy Zaidman
To ensure the quality of software systems, software engineers can make use of a variety of quality assurance approaches, for example, software testing, modern code review, automated static analysis, and build automation. Each of these quality assurance practices have been studied in depth in isolation, but there is a clear knowledge gap when it comes to our understanding of how these approaches are being used in conjunction, or not. In our study, we broadly investigate whether and how these quality assurance approaches are being used in conjunction in the development of 1454 popular open source software projects on GitHub. Our study indicates that typically projects do not follow all quality assurance practices together with high intensity. In fact, we only observe weak correlation among some quality assurance practices. In general, our study provides a deeper understanding of how existing quality assurance approaches are currently being used in Java-based open source software development. Besides, we specifically zoom in on the more mature projects in our dataset, and generally we observe that more mature projects are more intense in their application of the quality assurance practices, with more focus on their ASAT usage, and code reviewing, but no strong change in their CI usage. ...

An Investigation into Why Software Engineers (Occasionally) Ignore Coverage Checks

Conference paper (2024) - Alexander Sterk , Mairieli Wessel, Eli Hooten, Andy Zaidman
Many modern code coverage tools track and report code coverage data generated from running tests during continuous integration. They report code coverage data through a variety of channels, including email, Slack, Mattermost, or through the web interface of social coding platforms such as GitHub. In fact, this ensemble of tools can be configured in such a way that the software engineer gets a failing status check when code coverage drops below a certain threshold. In this study, we broadly investigate the opinions and experience with code coverage tools through a survey among 279 software engineers whose projects use the Codecov coverage tool and bot. In particular, we are investigating why software engineers would ignore a failing status check caused by drop in code coverage. We observe that >80% of software engineers-at least sometimes-ignore these failing status checks, and we get insights into the main reasons why software engineers ignore these checks. ...
Conference paper (2024) - Neda Džiugaite, Baris Ardic, Andy Zaidman
Software testing is a necessary aspect of software development. With high expectations placed on software testers and a shortage of qualified professionals, Massive Open Online Courses (MOOCs) have emerged as a potential solution to improve software testing education. MOOCs provide accessible education, bridging the gap between formal education and industry expectations. We investigate key aspects of and compare concepts of software testing MOOCs with university curricula and industry expectations. The findings show that a MOOC on average covers more concepts than a single university course. Additionally, MOOCs align well with what the industry expects from software testing practitioners. ...
Journal article (2024) - Imara van Dinten, Pouria Derakhshanfar, Annibale Panichella, Andy Zaidman
Cyber-Physical Systems (CPSs) have gained traction in recent years. A major non-functional quality of CPS is performance since it affects both usability and security. This critical quality attribute depends on the specialized hardware, simulation engines, and environmental factors that characterize the system under analysis. While a large body of research exists on performance issues in general, studies focusing on performance-related issues for CPSs are scarce. The goal of this paper is to build a taxonomy of performance issues in CPSs. To this aim, we present two empirical studies aimed at categorizing common performance issues (Study I) and helping developers detect them (Study II). In the first study, we examined commit messages and code changes in the history of 14 GitHub-hosted open-source CPS projects to identify commits that report and fix self-admitted performance issues. We manually analyzed 2699 commits, labeled them, and grouped the reported performance issues into antipatterns. We detected instances of three previously reported Software Performance Antipatterns (SPAs) for CPSs. Importantly, we also identified new SPAs for CPSs not described earlier in the literature. Furthermore, most performance issues identified in this study fall into two new antipattern categories: Hard Coded Fine Tuning (399 of 646) and Magical Waiting Number (150 of 646). In the second study, we introduce static analysis techniques for automatically detecting these two new antipatterns; we implemented them in a tool called AP-Spotter. We analyzed 9 open-source CPS projects not utilized to build the SPAs taxonomy to benchmark AP-Spotter. Our results show that AP-Spotter achieves 62.04% precision in detecting the antipatterns ...

What Working With Developers on Fuzz Tests Taught Us About Coverage Gaps

Conference paper (2024) - Carolin Brandt, Marco Castelluccio, Christian Holler, Jason Kratzer, Andy Zaidman, Alberto Bacchelli
Can fuzzers generate partial tests that developers find useful enough to complete into functional tests (e.g., by adding assertions)? To address this question, we develop a prototype within the Mozilla ecosystem and open 13 bug reports proposing partial generated tests for currently uncovered code. We found that the majority of the reactions focus on whether the targeted coverage gap is actually worth testing. To investigate further which coverage gaps developers find relevant to close, we design an automated filter to exclude irrelevant coverage gaps before generating tests. From conversations with 13 developers about whether the remaining coverage gaps are worth closing when a partially generated test is available, we learn that the filtering indeed removes clearly non-test-worthy gaps. The developers propose a variety of additional strategies to address the coverage gaps and how to make fuzz tests and reports more useful for developers. ...
Conference paper (2023) - Baris Ardic, Andy Zaidman
Software testing is generally acknowledged to be an important weapon in the arsenal of software engineers to produce correct and reliable software systems. However, given the importance of the topic, little is known about where software engineers get their testing knowledge and skills from. Is this through (higher) education, training programmes in the industry, or rather is it self-taught? In this paper, we investigate the curricula of 100 highly ranked universities and survey 51 software engineers to shed light on the state-of-the-practice in software testing education, in terms of both academic education and education of software engineers in the industry. ...
Conference paper (2023) - A. Deljouyi, A.E. Zaidman
Automatic unit test generators such as EvoSuite are able to automatically generate unit test suites with high coverage. This removes the burden of writing unit tests from developers, but the generated tests are often difficult to understand for them. In this paper, we introduce the MicroTestCarver approach that generates unit tests starting from manual or scripted end-toend (E2E) tests. Using carved information from these E2E tests, we generate unit tests that have meaningful test scenarios and contain actual test data. When we apply our MicroTestCarver approach, we observe that 85% of the generated tests are executable. Through a user study involving 20 participants, we get indications that tests generated with MicroTestCarver are relatively easy to understand. ...
Journal article (2023) - Jasper Denkers, Marvin Brunner, Louis van Gool, Jurgen J. Vinju, Andy Zaidman, Eelco Visser
Flexible printing systems are highly complex systems that consist of printers, that print individual sheets of paper, and finishing equipment, that processes sheets after printing, for example, assembling a book. Integrating finishing equipment with printers involves the development of control software that configures the devices, taking hardware constraints into account. This control software is highly complex to realize due to (1) the intertwined nature of printing and finishing, (2) the large variety of print products and production options for a given product, and (3) the large range of finishers produced by different vendors. We have developed a domain-specific language called CSX that offers an interface to constraint solving specific to the printing domain. We use it to model printing and finishing devices and to automatically derive constraint solver-based environments for automatic configuration. We evaluate CSX on its coverage of the printing domain in an industrial context, and we report on lessons learned on using a constraint-based DSL in an industrial context. ...
Conference paper (2023) - Imara van Dinten, A.E. Zaidman, A. Panichella
Test case prioritization techniques have emerged as effective strategies to optimize this process and mitigate the regression testing costs. Commonly, black-box heuristics guide optimal test ordering, leveraging information retrieval (e.g., cosine distance) to measure the test case distance and sort them accordingly. However, a challenge arises when dealing with tests of varying granularity levels, as they may employ distinct vocabularies (e.g., name identifiers). In this paper, we propose to measure the distance between test cases based on the shortest path between their identifiers within the WordNet lexical database. This additional heuristic is combined with the traditional cosine distance to prioritize test cases in a multi-objective fashion. Our preliminary study conducted with two different Java projects shows that test cases prioritized with WordNet achieve larger fault detection capability (APFD C ) compared to the traditional cosine distance used in the literature. ...
Conference paper (2023) - C.E. Brandt, D. Wang, A.E. Zaidman
Test amplification generates new tests by mutating existing, developer-written tests and keeping those tests that improve the coverage of the test suite. Current amplification tools focus on starting from a specific test and propose coverage improvements all over a software project, requiring considerable effort from the software engineer to understand and evaluate the different tests when deciding whether to include a test in the maintained test suite. In this paper, we propose a novel approach that lets the developer take charge and guide the test amplification process towards a specific branch they would like to test in a control flow graph visualization. We evaluate whether simple modifications to the automatic process that incorporate the guidance make the test amplification more effective at covering targeted branches. In a user study and semi-structured interviews we compare our user-guided test amplification approach to the state-of-the-art open test amplification approach. While our participants prefer the guided approach, we uncover several trade-offs that influence which approach is the better choice, largely depending on the use case of the developer. ...
Journal article (2023) - Mairieli Wessel, Andy Zaidman, Marco Aurélio Gerosa, Igor Steinmacher
Projects on GitHub rely on the automation provided by software development bots. Nevertheless, the presence of bots can be annoying and disruptive to the community. Backed by multiple studies with practitioners, this article provides guidelines for developing and maintaining software bots. ...