P. Derakhshanfar
Please Note
19 records found
1
Testing with simulation environments helps to identify critical failing scenarios for self-driving cars (SDCs). Simulation-based tests are safer than in-field operational tests and allow detecting software defects before deployment. However, these tests are very expensive and are too many to be run frequently within limited time constraints.In this article, we investigate test case prioritization techniques to increase the ability to detect SDC regression faults with virtual tests earlier. Our approach, called SDC-Prioritizer, prioritizes virtual tests for SDCs according to static features of the roads we designed to be used within the driving scenarios. These features can be collected without running the tests, which means that they do not require past execution results. We introduce two evolutionary approaches to prioritize the test cases using diversity metrics (black-box heuristics) computed on these static features. These two approaches, called SO-SDC-Prioritizer and MO-SDC-Prioritizer, use single-objective and multi-objective genetic algorithms (GA), respectively, to find trade-offs between executing the less expensive tests and the most diverse test cases earlier.Our empirical study conducted in the SDC domain shows that MO-SDC-Prioritizer significantly (P- value <=0.1e-10) improves the ability to detect safety-critical failures at the same level of execution time compared to baselines: random and greedy-based test case orderings. Besides, our study indicates that multi-objective meta-heuristics outperform single-objective approaches when prioritizing simulation-based tests for SDCs.MO-SDC-Prioritizer prioritizes test cases with a large improvement in fault detection while its overhead (up to 0.45% of the test execution cost) is negligible.
Search-based approaches have been used in the literature to automate the process of creating unit test cases. However, related work has shown that generated tests with high code coverage could be ineffective, i.e., they may not detect all faults or kill all injected mutants. In this paper, we propose Cling, an integration-level test case generation approach that exploits how a pair of classes, the caller and the callee, interact with each other through method calls. In particular, Cling generates integration-level test cases that maximize the Coupled Branches Criterion (CBC). Coupled branches are pairs of branches containing a branch of the caller and a branch of the callee such that an integration test that exercises the former also exercises the latter. CBC is a novel integration-level coverage criterion, measuring the degree to which a test suite exercises the interactions between a caller and its callee classes. We implemented Cling and evaluated the approach on 140 pairs of classes from five different open-source Java projects. Our results show that (1) Cling generates test suites with high CBC coverage, thanks to the definition of the test suite generation as a many-objectives problem where each couple of branches is an independent objective; (2) such generated suites trigger different class interactions and can kill on average 7.7% (with a maximum of 50%) of mutants that are not detected by tests generated randomly or at the unit level; (3) Cling can detect integration faults coming from wrong assumptions about the usage of the callee class (25 for our subject systems) that remain undetected when using automatically generated random and unit-level test suites.
Search-based techniques have been widely used for white-box test generation. Many of these approaches rely on the approach level and branch distance heuristics to guide the search process and generate test cases with high line and branch coverage. Despite the positive results achieved by these two heuristics, they only use the information related to the coverage of explicit branches (e.g., indicated by conditional and loop statements), but ignore potential implicit branchings within basic blocks of code. If such implicit branching happens at runtime (e.g., if an exception is thrown in a branchless-method), the existing fitness functions cannot guide the search process. To address this issue, we introduce a new secondary objective, called Basic Block Coverage (BBC), which takes into account the coverage level of relevant basic blocks in the control flow graph. We evaluated the impact of BBC on search-based unit test generation (using the DynaMOSA algorithm) and search-based crash reproduction (using the STDistance and WeightedSum fitness functions). Our results show that for unit test generation, BBC improves the branch coverage of the generated tests. Although small (∼ 1.5%), this improvement in the branch coverage is systematic and leads to an increase of the output domain coverage and implicit runtime exception coverage, and of the diversity of runtime states. In terms of crash reproduction, in the combination of STDistance and WeightedSum, BBC helps in reproducing 3 new crashes for each fitness function. BBC significantly decreases the time required to reproduce 43.5% and 45.1% of the crashes using STDistance and WeightedSum, respectively. For these crashes, BBC reduces the consumed time by 71.7% (for STDistance) and 68.7% (for WeightedSum) on average.
This is an extended abstract of the article: Pouria Derakhshanfar, Xavier Devroey, Gilles Perrouin, Andy Zaidman and Arie van Deursen. 2019. Search-based crash reproduction using behavioural model seeding. In: Software Testing, Verification and Reliability (May 2020). http://doi.org/10.1002/stvr.1733.
This study presents the initial step towards a thorough analysis of the difficulty to reproduce a crash using searchbased crash reproduction. Traditionally, code size and complexity are considered representative indicators of the difficulty for search-based approaches, like search-based unit test generation, to generate tests. However, unlike unit test generation, crash reproduction does not seek to cover a set of behaviors but instead to generate one or more tests exercising a specific behavior reproducing a given crash. In this context, there is no guarantee that the indicators used for unit testing are still valid for crash reproduction. In this study, we seek to identify such indicators by considering various code metrics, code smells, and change metrics. We report our effort to collect those metrics for JCRASHPACK, a state-of-the-art crash reproduction benchmark, and an initial assessment by considering metrics individually. Our results show that although JCRASHPACK is larger than benchmarks used in previous studies, additional crashes should be added to improve its diversity and representativeness, and that no individual metric can be used to characterize the difficulty to reproduce a crash.
Good Things Come In Threes
Improving Search-based Crash Reproduction With Helper Objectives
Search-based test data generation approaches have come a long way over the past few years, but these approaches still have some limitations when it comes to exercising specific behavior for triggering particular kinds of faults (e.g., crashes or specific types of integration between classes/modules). In this thesis, we are investigating new fitness functions and evolutionary-based algorithms and techniques to tackle these limitations. We have defined multiple novel approaches for crash reproduction and class integration testing. Currently, we are still working on improving both crash reproduction and class integration testing.
Demo. video: https://www.youtube.com/watch?v=k6XaQjHqe48
Botsing website: https://stamp-project.github.io/botsing/ ...
Demo. video: https://www.youtube.com/watch?v=k6XaQjHqe48
Botsing website: https://stamp-project.github.io/botsing/