A.E. Zaidman
Please Note
120 records found
1
On the emergence of testing strategies
A socio-technical grounded theory
Not One to Rule Them All
Mining Meaningful Code Review Orders From GitHub
Developers use tools such as GitHub pull requests to review code, discuss proposed changes, and request modifications. While changed files are commonly presented in alphabetical order, this does not necessarily coincide with the reviewer's preferred navigation sequence. This study investigates the different navigation orders developers follow while commenting on changes submitted in pull requests. We mined code review comments from 23,241 pull requests in 100 popular Java and Python repositories on GitHub to analyze the order in which the reviewers commented on the submitted changes. Our analysis shows that for 44.6% of pull requests, the reviewers comment in a non-alphabetical order. Among these pull requests, we identified traces of alternative meaningful orders: 20.6% (2,134) followed a largest-diff first order, 17.6% (1,827) were commented in the order of the files' similarity to the pull request's title and description, and 29% (1,188) of pull requests containing changes to both production and test files adhered to a test-first order. We also observed that the proportion of reviewed files to total submitted files was significantly higher in non-alphabetically ordered reviews, which also received slightly fewer approvals from reviewers, on average. Our findings highlight the need for additional support during code reviews, particularly for larger pull requests, where reviewers are more likely to adopt complex strategies rather than following a single predefined order.
OIL
An industrial case study in language engineering with Spoofax
The qualitative factor in software testing
A systematic mapping study of qualitative methods
As we have come to rely on software systems in our daily lives, we have a clear expectation about the reliability of these systems. To ensure this reliability, automated software quality assurance processes have become an important part of software development. However, given the climate crisis that we are witnessing, it is important to ask ourselves what the impact of all these automated quality assurance processes is in terms of electricity consumption. This study explores the electricity consumption and potential environmental impact of continuous integration and software testing in 10 open source software projects.
Shaken, Not Stirred
How Developers Like Their Amplified Tests
Test amplification makes systematic changes to existing, manually written tests to provide tests complementary to an automated test suite. We consider developer-centric test amplification, where the developer explores, judges and edits the amplified tests before adding them to their maintained test suite. However, it is as yet unclear which kind of selection and editing steps developers take before including an amplified test into the test suite. In this paper we conduct an open source contribution study, amplifying tests of open source Java projects from GitHub. We report which deficiencies we observe in the amplified tests while manually filtering and editing them to open 39 pull requests with amplified tests. We present a detailed analysis of the maintainer's feedback regarding proposed changes, requested information, and expressed judgment. Our observations provide a basis for practitioners to take an informed decision on whether to adopt developer-centric test amplification. As several of the edits we observe are based on the developer's understanding of the amplified test, we conjecture that developer-centric test amplification should invest in supporting the developer to understand the amplified tests.
Using GitHub Copilot for Test Generation in Python
An Empirical Study
Writing unit tests is a crucial task in software development, but it is also recognized as a time-consuming and tedious task. As such, numerous test generation approaches have been proposed and investigated. However, most of these test generation tools produce tests that are typically difficult to understand. Recently, Large Language Models (LLMs) have shown promising results in generating source code and supporting software engineering tasks. As such, we investigate the usability of tests generated by GitHub Copilot, a proprietary closed-source code generation tool that uses an LLM. We evaluate GitHub Copilot's test generation abilities both within and without an existing test suite, and we study the impact of different code commenting strategies on test generations.Our investigation evaluates the usability of 290 tests generated by GitHub Copilot for 53 sampled tests from open source projects. Our findings highlight that within an existing test suite, approximately 45.28% of the tests generated by Copilot are passing tests; 54.72% of generated tests are failing, broken, or empty tests. Furthermore, if we generate tests using Copilot without an existing test suite in place, we observe that 92.45% of the tests are failing, broken, or empty tests. Additionally, we study how test method comments influence the usability of test generations.
Scoping Software Engineering for AI
The TSE Perspective
To ensure the quality of software systems, software engineers can make use of a variety of quality assurance approaches, for example, software testing, modern code review, automated static analysis, and build automation. Each of these quality assurance practices have been studied in depth in isolation, but there is a clear knowledge gap when it comes to our understanding of how these approaches are being used in conjunction, or not. In our study, we broadly investigate whether and how these quality assurance approaches are being used in conjunction in the development of 1454 popular open source software projects on GitHub. Our study indicates that typically projects do not follow all quality assurance practices together with high intensity. In fact, we only observe weak correlation among some quality assurance practices. In general, our study provides a deeper understanding of how existing quality assurance approaches are currently being used in Java-based open source software development. Besides, we specifically zoom in on the more mature projects in our dataset, and generally we observe that more mature projects are more intense in their application of the quality assurance practices, with more focus on their ASAT usage, and code reviewing, but no strong change in their CI usage.
Running a Red Light
An Investigation into Why Software Engineers (Occasionally) Ignore Coverage Checks
Many modern code coverage tools track and report code coverage data generated from running tests during continuous integration. They report code coverage data through a variety of channels, including email, Slack, Mattermost, or through the web interface of social coding platforms such as GitHub. In fact, this ensemble of tools can be configured in such a way that the software engineer gets a failing status check when code coverage drops below a certain threshold. In this study, we broadly investigate the opinions and experience with code coverage tools through a survey among 279 software engineers whose projects use the Codecov coverage tool and bot. In particular, we are investigating why software engineers would ignore a failing status check caused by drop in code coverage. We observe that >80% of software engineers-at least sometimes-ignore these failing status checks, and we get insights into the main reasons why software engineers ignore these checks.
Software testing is a necessary aspect of software development. With high expectations placed on software testers and a shortage of qualified professionals, Massive Open Online Courses (MOOCs) have emerged as a potential solution to improve software testing education. MOOCs provide accessible education, bridging the gap between formal education and industry expectations. We investigate key aspects of and compare concepts of software testing MOOCs with university curricula and industry expectations. The findings show that a MOOC on average covers more concepts than a single university course. Additionally, MOOCs align well with what the industry expects from software testing practitioners.
Mind the Gap
What Working With Developers on Fuzz Tests Taught Us About Coverage Gaps
Can fuzzers generate partial tests that developers find useful enough to complete into functional tests (e.g., by adding assertions)? To address this question, we develop a prototype within the Mozilla ecosystem and open 13 bug reports proposing partial generated tests for currently uncovered code. We found that the majority of the reactions focus on whether the targeted coverage gap is actually worth testing. To investigate further which coverage gaps developers find relevant to close, we design an automated filter to exclude irrelevant coverage gaps before generating tests. From conversations with 13 developers about whether the remaining coverage gaps are worth closing when a partially generated test is available, we learn that the filtering indeed removes clearly non-test-worthy gaps. The developers propose a variety of additional strategies to address the coverage gaps and how to make fuzz tests and reports more useful for developers.
Software testing is generally acknowledged to be an important weapon in the arsenal of software engineers to produce correct and reliable software systems. However, given the importance of the topic, little is known about where software engineers get their testing knowledge and skills from. Is this through (higher) education, training programmes in the industry, or rather is it self-taught? In this paper, we investigate the curricula of 100 highly ranked universities and survey 51 software engineers to shed light on the state-of-the-practice in software testing education, in terms of both academic education and education of software engineers in the industry.
Taming complexity of industrial printing systems using a constraint-based DSL
An industrial experience report
Flexible printing systems are highly complex systems that consist of printers, that print individual sheets of paper, and finishing equipment, that processes sheets after printing, for example, assembling a book. Integrating finishing equipment with printers involves the development of control software that configures the devices, taking hardware constraints into account. This control software is highly complex to realize due to (1) the intertwined nature of printing and finishing, (2) the large variety of print products and production options for a given product, and (3) the large range of finishers produced by different vendors. We have developed a domain-specific language called CSX that offers an interface to constraint solving specific to the printing domain. We use it to model printing and finishing devices and to automatically derive constraint solver-based environments for automatic configuration. We evaluate CSX on its coverage of the printing domain in an industrial context, and we report on lessons learned on using a constraint-based DSL in an industrial context.
Test case prioritization techniques have emerged as effective strategies to optimize this process and mitigate the regression testing costs. Commonly, black-box heuristics guide optimal test ordering, leveraging information retrieval (e.g., cosine distance) to measure the test case distance and sort them accordingly. However, a challenge arises when dealing with tests of varying granularity levels, as they may employ distinct vocabularies (e.g., name identifiers). In this paper, we propose to measure the distance between test cases based on the shortest path between their identifiers within the WordNet lexical database. This additional heuristic is combined with the traditional cosine distance to prioritize test cases in a multi-objective fashion. Our preliminary study conducted with two different Java projects shows that test cases prioritized with WordNet achieve larger fault detection capability (APFD C ) compared to the traditional cosine distance used in the literature.
Projects on GitHub rely on the automation provided by software development bots. Nevertheless, the presence of bots can be annoying and disruptive to the community. Backed by multiple studies with practitioners, this article provides guidelines for developing and maintaining software bots.