GV
G.J.B. Vegelien
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
2 records found
1
In today’s rapidly evolving software landscape, where continuous integration and continuous delivery are paramount, the presence of flaky tests poses a significant obstacle. These tests, exhibiting unpredictable pass/fail behavior, hinder development progress, waste valuable resources, and erode developer trust. This research delves into the root causes and mitigation strategies for flaky tests within a large-scale, database-driven industrial setting: Exact.
The increasing reliance on databases in modern software systems, including Exact’s own platform, necessitates a deeper understanding of the unique challenges posed by database-dependent tests. By analyzing flaky test behavior through repeated test runs on the same code, we identified key contributors to flakiness, including resource contention, test order dependencies, ‘dirty tests’ that leave the system in an inconsistent state, platform-specific issues, and combinations thereof.
Based on the root causes for flakiness at Exact, we developed and evaluated three mitigation strategies and supporting tools: minimizing redundant database background tasks, explicitly disposing of test data, and disabling database dirty tests. Our study resulted in a substantial reduction in flakiness, leading to a significant increase in the release rate from Exact from 60% to 96%. We improved the chance of their CI/CD pipeline passing with no code changes from 27% to 95%.
Furthermore, this research highlights the importance of collecting and analyzing rich, granular test data to identify patterns and root causes of flakiness. Providing developers with actionable information from this analysis motivates them to address flakiness proactively. Moreover, understanding the interplay between different types of tests, such as the impact of dirty tests on other seemingly unrelated tests or in combination with other factors, is crucial for effectively mitigating cascading failures. ...
The increasing reliance on databases in modern software systems, including Exact’s own platform, necessitates a deeper understanding of the unique challenges posed by database-dependent tests. By analyzing flaky test behavior through repeated test runs on the same code, we identified key contributors to flakiness, including resource contention, test order dependencies, ‘dirty tests’ that leave the system in an inconsistent state, platform-specific issues, and combinations thereof.
Based on the root causes for flakiness at Exact, we developed and evaluated three mitigation strategies and supporting tools: minimizing redundant database background tasks, explicitly disposing of test data, and disabling database dirty tests. Our study resulted in a substantial reduction in flakiness, leading to a significant increase in the release rate from Exact from 60% to 96%. We improved the chance of their CI/CD pipeline passing with no code changes from 27% to 95%.
Furthermore, this research highlights the importance of collecting and analyzing rich, granular test data to identify patterns and root causes of flakiness. Providing developers with actionable information from this analysis motivates them to address flakiness proactively. Moreover, understanding the interplay between different types of tests, such as the impact of dirty tests on other seemingly unrelated tests or in combination with other factors, is crucial for effectively mitigating cascading failures. ...
In today’s rapidly evolving software landscape, where continuous integration and continuous delivery are paramount, the presence of flaky tests poses a significant obstacle. These tests, exhibiting unpredictable pass/fail behavior, hinder development progress, waste valuable resources, and erode developer trust. This research delves into the root causes and mitigation strategies for flaky tests within a large-scale, database-driven industrial setting: Exact.
The increasing reliance on databases in modern software systems, including Exact’s own platform, necessitates a deeper understanding of the unique challenges posed by database-dependent tests. By analyzing flaky test behavior through repeated test runs on the same code, we identified key contributors to flakiness, including resource contention, test order dependencies, ‘dirty tests’ that leave the system in an inconsistent state, platform-specific issues, and combinations thereof.
Based on the root causes for flakiness at Exact, we developed and evaluated three mitigation strategies and supporting tools: minimizing redundant database background tasks, explicitly disposing of test data, and disabling database dirty tests. Our study resulted in a substantial reduction in flakiness, leading to a significant increase in the release rate from Exact from 60% to 96%. We improved the chance of their CI/CD pipeline passing with no code changes from 27% to 95%.
Furthermore, this research highlights the importance of collecting and analyzing rich, granular test data to identify patterns and root causes of flakiness. Providing developers with actionable information from this analysis motivates them to address flakiness proactively. Moreover, understanding the interplay between different types of tests, such as the impact of dirty tests on other seemingly unrelated tests or in combination with other factors, is crucial for effectively mitigating cascading failures.
The increasing reliance on databases in modern software systems, including Exact’s own platform, necessitates a deeper understanding of the unique challenges posed by database-dependent tests. By analyzing flaky test behavior through repeated test runs on the same code, we identified key contributors to flakiness, including resource contention, test order dependencies, ‘dirty tests’ that leave the system in an inconsistent state, platform-specific issues, and combinations thereof.
Based on the root causes for flakiness at Exact, we developed and evaluated three mitigation strategies and supporting tools: minimizing redundant database background tasks, explicitly disposing of test data, and disabling database dirty tests. Our study resulted in a substantial reduction in flakiness, leading to a significant increase in the release rate from Exact from 60% to 96%. We improved the chance of their CI/CD pipeline passing with no code changes from 27% to 95%.
Furthermore, this research highlights the importance of collecting and analyzing rich, granular test data to identify patterns and root causes of flakiness. Providing developers with actionable information from this analysis motivates them to address flakiness proactively. Moreover, understanding the interplay between different types of tests, such as the impact of dirty tests on other seemingly unrelated tests or in combination with other factors, is crucial for effectively mitigating cascading failures.
Developer-Friendly Test Cases
Detection and Removalof Unnecessary Casts
Test-cube is a tool that focuses on developer-friendly test amplification. Test amplification is a technique to improve a test suite by generating new tests based on manually written ones. Currently, these generated tests contain much redundant casting. Our study aimed to improve the readability of these generated test cases by reducing superfluous casting. To test this, we developed multiple cast deleters categorized into two types: simple and fine-grained cast deleters. A simple cast deleter removes casts based on limited knowledge but can make errors. A fine-grained cast deleter only removes casts if it knows they are redundant based on much contextual information. We compared these two types in terms of accuracy by gathering statistical data when running them against real-world examples of amplified test cases. We also discussed the types of casting cases for which they performed well or could be improved based on manual code inspections. In this study, when amplifying all the tests of four public repositories, we found 3,085 casts in 281 tests containing casts. Of these, 97.18% were redundant, and our fine-grained deleter detected and deleted 98.87% of these. We found this fine-grained deleter to be the worthwhile option compared to a simple cast deleter. This was not because the simple cast deleter was slightly less accurate at 97.18% rather than 98.80% but because it showed extremely inconsistent accuracy. The second benefit of the fine-grained deleter was that it caused no tests to fail, while the simple cast deleter caused 18.15% of tests to fail. This paper provides excellent insights and techniques to reduce vast amounts of redundant casting in test amplification. We hope it will make test amplification more developer-friendly and increase its overall practicality.
...
Test-cube is a tool that focuses on developer-friendly test amplification. Test amplification is a technique to improve a test suite by generating new tests based on manually written ones. Currently, these generated tests contain much redundant casting. Our study aimed to improve the readability of these generated test cases by reducing superfluous casting. To test this, we developed multiple cast deleters categorized into two types: simple and fine-grained cast deleters. A simple cast deleter removes casts based on limited knowledge but can make errors. A fine-grained cast deleter only removes casts if it knows they are redundant based on much contextual information. We compared these two types in terms of accuracy by gathering statistical data when running them against real-world examples of amplified test cases. We also discussed the types of casting cases for which they performed well or could be improved based on manual code inspections. In this study, when amplifying all the tests of four public repositories, we found 3,085 casts in 281 tests containing casts. Of these, 97.18% were redundant, and our fine-grained deleter detected and deleted 98.87% of these. We found this fine-grained deleter to be the worthwhile option compared to a simple cast deleter. This was not because the simple cast deleter was slightly less accurate at 97.18% rather than 98.80% but because it showed extremely inconsistent accuracy. The second benefit of the fine-grained deleter was that it caused no tests to fail, while the simple cast deleter caused 18.15% of tests to fail. This paper provides excellent insights and techniques to reduce vast amounts of redundant casting in test amplification. We hope it will make test amplification more developer-friendly and increase its overall practicality.