PD

P. Derakhshanfar

info

Please Note

19 records found

Conference paper (2024) - Arkadii Sapozhnikov, Mitchell Olsthoorn, A. Panichella, V.V. Kovalenko, P. Derakhshanfar
Writing software tests is laborious and time-consuming. To address this, prior studies introduced various automated test-generation techniques. A well-explored research direction in this field is unit test generation, wherein artificial intelligence (AI) techniques create tests for a method/class under test. While many of these techniques have primarily found applications in a research context, existing tools (e.g., EvoSuite, Randoop, and AthenaTest) are not user-friendly and are tailored to a single technique. This paper introduces TestSpark, a plugin for IntelliJ IDEA that enables users to generate unit tests with only a few clicks directly within their Integrated Development Environment (IDE). Furthermore, TestSpark also allows users to easily modify and run each generated test and integrate them into the project workflow. TestSpark leverages the advances of search-based test generation tools, and it introduces a technique to generate unit tests using Large Language Models (LLMs) by creating a feedback cycle between the IDE and the LLM. Since TestSpark is an open-source (https://github.com/JetBrains-Research/TestSpark), extendable, and well-documented tool, it is possible to add new test generation methods into the plugin with the minimum effort. This paper also explains our future studies related to TestSpark and our preliminary results. Demo video: https://youtu.be/0F4PrxWfiXo ...
Journal article (2023) - Christian Birchler, Sajad Khatiri, P. Derakhshanfar, Sebastiano Panichella, A. Panichella
Testing with simulation environments helps to identify critical failing scenarios for self-driving cars (SDCs). Simulation-based tests are safer than in-field operational tests and allow detecting software defects before deployment. However, these tests are very expensive and are too many to be run frequently within limited time constraints.In this article, we investigate test case prioritization techniques to increase the ability to detect SDC regression faults with virtual tests earlier. Our approach, called SDC-Prioritizer, prioritizes virtual tests for SDCs according to static features of the roads we designed to be used within the driving scenarios. These features can be collected without running the tests, which means that they do not require past execution results. We introduce two evolutionary approaches to prioritize the test cases using diversity metrics (black-box heuristics) computed on these static features. These two approaches, called SO-SDC-Prioritizer and MO-SDC-Prioritizer, use single-objective and multi-objective genetic algorithms (GA), respectively, to find trade-offs between executing the less expensive tests and the most diverse test cases earlier.Our empirical study conducted in the SDC domain shows that MO-SDC-Prioritizer significantly (P- value <=0.1e-10) improves the ability to detect safety-critical failures at the same level of execution time compared to baselines: random and greedy-based test case orderings. Besides, our study indicates that multi-objective meta-heuristics outperform single-objective approaches when prioritizing simulation-based tests for SDCs.MO-SDC-Prioritizer prioritizes test cases with a large improvement in fault detection while its overhead (up to 0.45% of the test execution cost) is negligible. ...
Journal article (2023) - Pouria Derakhshanfar, Xavier Devroey, Annibale Panichella, Andy Zaidman, Arie van Deursen
Search-based approaches have been used in the literature to automate the process of creating unit test cases. However, related work has shown that generated tests with high code coverage could be ineffective, i.e., they may not detect all faults or kill all injected mutants. In this paper, we propose Cling, an integration-level test case generation approach that exploits how a pair of classes, the caller and the callee, interact with each other through method calls. In particular, Cling generates integration-level test cases that maximize the Coupled Branches Criterion (CBC). Coupled branches are pairs of branches containing a branch of the caller and a branch of the callee such that an integration test that exercises the former also exercises the latter. CBC is a novel integration-level coverage criterion, measuring the degree to which a test suite exercises the interactions between a caller and its callee classes. We implemented Cling and evaluated the approach on 140 pairs of classes from five different open-source Java projects. Our results show that (1) Cling generates test suites with high CBC coverage, thanks to the definition of the test suite generation as a many-objectives problem where each couple of branches is an independent objective; (2) such generated suites trigger different class interactions and can kill on average 7.7% (with a maximum of 50%) of mutants that are not detected by tests generated randomly or at the unit level; (3) Cling can detect integration faults coming from wrong assumptions about the usage of the callee class (25 for our subject systems) that remain undetected when using automatically generated random and unit-level test suites. ...
Conference paper (2022) - Pouria Derakhshanfar, Xavier Devroey
Basic Block Coverage (BBC) is a secondary objective for search-based unit test generation techniques relying on the approach level and branch distance to drive the search process. Unlike the approach level and branch distance, which considers only information related to the coverage of explicit branches coming from conditional and loop statements, BBC also takes into account implicit branchings (e.g., a runtime exception thrown in a branchless method) denoted by the coverage level of relevant basic blocks in a control flow graph to drive the search process. Our implementation of BBC for unit test generation relies on the DynaMOSA algorithm and EvoSuite. This paper summarizes the results achieved by EvoSuite's DynaMOSA implementation with BBC as a secondary objective at the SBST 2022 unit testing tool competition. ...
Journal article (2022) - Pouria Derakhshanfar, Xavier Devroey, Andy Zaidman
Search-based techniques have been widely used for white-box test generation. Many of these approaches rely on the approach level and branch distance heuristics to guide the search process and generate test cases with high line and branch coverage. Despite the positive results achieved by these two heuristics, they only use the information related to the coverage of explicit branches (e.g., indicated by conditional and loop statements), but ignore potential implicit branchings within basic blocks of code. If such implicit branching happens at runtime (e.g., if an exception is thrown in a branchless-method), the existing fitness functions cannot guide the search process. To address this issue, we introduce a new secondary objective, called Basic Block Coverage (BBC), which takes into account the coverage level of relevant basic blocks in the control flow graph. We evaluated the impact of BBC on search-based unit test generation (using the DynaMOSA algorithm) and search-based crash reproduction (using the STDistance and WeightedSum fitness functions). Our results show that for unit test generation, BBC improves the branch coverage of the generated tests. Although small (∼ 1.5%), this improvement in the branch coverage is systematic and leads to an increase of the output domain coverage and implicit runtime exception coverage, and of the diversity of runtime states. In terms of crash reproduction, in the combination of STDistance and WeightedSum, BBC helps in reproducing 3 new crashes for each fitness function. BBC significantly decreases the time required to reproduce 43.5% and 45.1% of the crashes using STDistance and WeightedSum, respectively. For these crashes, BBC reduces the consumed time by 71.7% (for STDistance) and 68.7% (for WeightedSum) on average. ...
Conference paper (2021) - P. Derakhshanfar, Xavier Devroey, Gilles Perrouin, A.E. Zaidman, A. van Deursen
This is an extended abstract of the article: Pouria Derakhshanfar, Xavier Devroey, Gilles Perrouin, Andy Zaidman and Arie van Deursen. 2019. Search-based crash reproduction using behavioural model seeding. In: Software Testing, Verification and Reliability (May 2020). http://doi.org/10.1002/stvr.1733. ...
Doctoral thesis (2021) - P. Derakhshanfar
Software testing is one of the essential and expensive tasks in software development. Hence, many approaches were introduced to automate different software testing tasks. Among these techniques, search-based test generation techniques have been vastly applied in real-world cases and have shown promising results. These strategies apply search-based methods for generating tests according to various test criteria such as line and branch coverage. In this thesis, we introduce new search objectives and techniques using various knowledge carved from resources like source code, hand-written test cases, and execution logs. These novel search objectives and approaches (i) improve the state-of-the-art in search-based crash reproduction, (ii) present a new search-based approach to generate class-integration tests covering interactions between two given classes., and (iii) introduce two new search objectives for covering common/uncommon execution patterns observed during the software production. ...
State-of-the-art search-based approaches for test case generation work at test case level, where tests are represented as sequences of statements. These approaches make use of genetic operators (i.e., mutation and crossover) that create test variants by adding, altering, and removing statements from existing tests. While this encoding schema has been shown to be very effective for many-objective test case generation, the standard crossover operator (single-point) only alters the structure of the test cases but not the input data. In this paper, we argue that changing both the test case structure and the input data is necessary to increase the genetic variation and improve the search process. Hence, we propose a hybrid multi-level crossover (HMX) operator that combines the traditional test-level crossover with data-level recombination. The former evolves and alters the test case structures, while the latter evolves the input data using numeric and string-based recombinational operators. We evaluate our new crossover operator by performing an empirical study on more than 100 classes selected from open-source Java libraries for numerical operations and string manipulation. We compare HMX with the single-point crossover that is used in EvoSuite w.r.t structural coverage and fault detection capability. Our results show that HMX achieves a statistically significant increase in 30% of the classes up to 19% in structural coverage compared to the single-point crossover. Moreover, the fault detection capability improved up to 12% measured using strong mutation score. ...
Journal article (2020) - Boris Cherry, Xavier Devroey, Pouria Derakhshanfar, Benoît Vanderose
This study presents the initial step towards a thorough analysis of the difficulty to reproduce a crash using searchbased crash reproduction. Traditionally, code size and complexity are considered representative indicators of the difficulty for search-based approaches, like search-based unit test generation, to generate tests. However, unlike unit test generation, crash reproduction does not seek to cover a set of behaviors but instead to generate one or more tests exercising a specific behavior reproducing a given crash. In this context, there is no guarantee that the indicators used for unit testing are still valid for crash reproduction. In this study, we seek to identify such indicators by considering various code metrics, code smells, and change metrics. We report our effort to collect those metrics for JCRASHPACK, a state-of-the-art crash reproduction benchmark, and an initial assessment by considering metrics individually. Our results show that although JCRASHPACK is larger than benchmarks used in previous studies, additional crashes should be added to improve its diversity and representativeness, and that no individual metric can be used to characterize the difficulty to reproduce a crash. ...
Conference paper (2020) - Pouria Derakhshanfar, Xavier Devroey, Andy Zaidman
Search-based techniques have been widely used for white-box test generation. Many of these approaches rely on the approach level and branch distance heuristics to guide the search process and generate test cases with high line and branch coverage. Despite the positive results achieved by these two heuristics, they only use the information related to the coverage of explicit branches (e.g., indicated by conditional and loop statements), but ignore potential implicit branchings within basic blocks of code. If such implicit branching happens at runtime (e.g., if an exception is thrown in a branchless-method), the existing fitness functions cannot guide the search process. To address this issue, we introduce a new secondary objective, called Basic Block Coverage (BBC), which takes into account the coverage level of relevant basic blocks in the control flow graph. We evaluated the impact of BBC on search-based crash reproduction because the implicit branches commonly occur when trying to reproduce a crash, and the search process needs to cover only a few basic blocks (i.e., blocks that are executed before crash happening). We combined BBC with existing fitness functions (namely STDistance and WeightedSum) and ran our evaluation on 124 hard-to-reproduce crashes. Our results show that BBC, in combination with STDistance and WeightedSum, reproduces 6 and 1 new crashes, respectively. BBC significantly decreases the time required to reproduce 26.6% and 13.7% of the crashes using STDistance and WeightedSum, respectively. For these crashes, BBC reduces the consumed time by 44.3% (for STDistance) and 40.6% (for WeightedSum) on average. ...
Model seeding is a strategy for injecting additional information in a search-based test generation process in the form of models, representing usages of the classes of the software under test. These models are used during the search-process to generate logical sequences of calls whenever an instance of a specific class is required. Model seeding was originally proposed for search-based crash reproduction. We adapted it to unit test generation using EvoSuite and applied it to GSON, a Java library to convert Java objects from and to JSON. Although our study shows mixed results, it identifies potential future research directions. ...
Various search-based test generation techniques have been proposed to automate the generation of unit tests fulfilling different criteria (e.g., line coverage, branch coverage, mutation score, etc.). Despite several advances made over the years, search-based unit test generation still suffers from a lack of guidance due to the limited amount of information available in the source code that, for instance, hampers the generation of complex objects. Previous studies introduced many strategies to address this issue, e.g., dynamic symbolic execution or seeding, but do not take the internal execution of the methods into account. In this paper, we introduce a novel secondary objective called commonality score, measuring how close the execution path of a test case is from reproducing a common or uncommon execution pattern observed during the operation of the software. To assess the commonality score, we implemented it in EvoSuite and evaluated its application on 150 classes from JabRef, an open-source software for managing bibliography references. Our results are mixed. Our approach leads to test cases that indeed follow common or uncommon execution patterns. However, if the commonality score can have a positive impact on the structural coverage and mutation score of the generated test suites, it can also be detrimental in some cases. ...
Crash reproduction approaches help developers during debugging by generating a test case that reproduces a given crash. Several solutions have been proposed to automate this task. However, the proposed solutions have been evaluated on a limited number of projects, making comparison difficult. In this paper, we enhance this line of research by proposing JCrashPack, an extensible benchmark for Java crash reproduction, together with ExRunner, a tool to simply and systematically run evaluations. JCrashPack contains 200 stack traces from various Java projects, including industrial open source ones, on which we run an extensive evaluation of EvoCrash, the state-of-the-art tool for search-based crash reproduction. EvoCrash successfully reproduced 43% of the crashes. Furthermore, we observed that reproducing NullPointerException, IllegalArgumentException, and IllegalStateException is relatively easier than reproducing ClassCastException, ArrayIndexOutOfBoundsException and StringIndexOutOfBoundsException. Our results include a detailed manual analysis of EvoCrash outputs, from which we derive 14 current challenges for crash reproduction, among which the generation of input data and the handling of abstract and anonymous classes are the most frequents. Finally, based on those challenges, we discuss future research directions for search-based crash reproduction for Java. ...

Improving Search-based Crash Reproduction With Helper Objectives

Writing a test case reproducing a reported software crash is a common practice to identify the root cause of an anomaly in the software under test. However, this task is usually labor-intensive and time-taking. Hence, evolutionary intelligence approaches have been successfully applied to assist developers during debugging by generating a test case reproducing reported crashes. These approaches use a single fitness function called Crash Distance to guide the search process toward reproducing a target crash. Despite the reported achievements, these approaches do not always successfully reproduce some crashes due to a lack of test diversity (premature convergence). In this study, we introduce a new approach, called MO-HO, that addresses this issue via multi-objectivization. In particular, we introduce two new Helper-Objectives for crash reproduction, namely test length (to minimize) and method sequence diversity (to maximize), in addition to Crash Distance. We assessed MOHO using five multi-objective evolutionary algorithms (NSGA-II, SPEA2, PESA-II, MOEA/D, FEMO) on 124 non-trivial crashes stemming from open-source projects. Our results indicate that SPEA2 is the best-performing multi-objective algorithm for MO-HO. We evaluated this best-performing algorithm for MO-HO against the state-of-the-art: single-objective approach (Single-Objective Search) and decomposition-based multi-objectivization approach (De-MO). Our results show that MO-HO reproduces five crashes that cannot be reproduced by the current state-of-the-art. Besides, MO-HO improves the effectiveness (+10% and +8% in reproduction ratio) and the efficiency in 34.6% and 36% of crashes (i.e., significantly lower running time) compared to Single-Objective Search and De-MO, respectively. For some crashes, the improvements are very large, being up to +93.3% for reproduction ratio and -92% for the required running time. ...
Conference paper (2020) - Pouria Derakhshanfar
Search-based test data generation approaches have come a long way over the past few years, but these approaches still have some limitations when it comes to exercising specific behavior for triggering particular kinds of faults (e.g., crashes or specific types of integration between classes/modules). In this thesis, we are investigating new fitness functions and evolutionary-based algorithms and techniques to tackle these limitations. We have defined multiple novel approaches for crash reproduction and class integration testing. Currently, we are still working on improving both crash reproduction and class integration testing. ...
Evolutionary-based crash reproduction techniques aid developers in their debugging practices by generating a test case that reproduces a crash given its stack trace. In these techniques, the search process is typically guided by a single search objective called Crash Distance. Previous studies have shown that current approaches could only reproduce a limited number of crashes due to a lack of diversity in the population during the search. In this study, we address this issue by applying Multi-Objectivization using Helper-Objectives (MO-HO) on crash reproduction. In particular, we add two helper-objectives to the Crash Distance to improve the diversity of the generated test cases and consequently enhance the guidance of the search process. We assessed MO-HO against the single-objective crash reproduction. Our results show that MO-HO can reproduce two additional crashes that were not previously reproducible by the single-objective approach. ...
Approaches for automatic crash reproduction aim to generate test cases that reproduce crashes starting from the crash stack traces. These tests help developers during their debugging practices. One of the most promising techniques in this research field leverages search-based software testing techniques for generating crash reproducing test cases. In this paper, we introduce Botsing, an open-source search-based crash reproduction framework for Java. Botsing implements state-of-the-art and novel approaches for crash reproduction. The well-documented architecture of Botsing makes it an easy-to-extend framework, and can hence be used for implementing new approaches to improve crash reproduction. We have applied Botsing to a wide range of crashes collected from open source systems. Furthermore, we conducted a qualitative assessment of the crash-reproducing test cases with our industrial partners. In both cases, Botsing could reproduce a notable amount of the given stack traces.
Demo. video: https://www.youtube.com/watch?v=k6XaQjHqe48
Botsing website: https://stamp-project.github.io/botsing/ ...
Journal article (2020) - Pouria Derakhshanfar, Xavier Devroey, Gilles Perrouin, Andy Zaidman, Arie van Deursen
Search-based crash reproduction approaches assist developers during debugging by generating a test case, which reproduces a crash given its stack trace. One of the fundamental steps of this approach is creating objects needed to trigger the crash. One way to overcome this limitation is seeding: using information about the application during the search process. With seeding, the existing usages of classes can be used in the search process to produce realistic sequences of method calls, which create the required objects. In this study, we introduce behavioural model seeding: a new seeding method that learns class usages from both the system under test and existing test cases. Learned usages are then synthesized in a behavioural model (state machine). Then, this model serves to guide the evolutionary process. To assess behavioural model seeding, we evaluate it against test seeding (the state-of-the-art technique for seeding realistic objects) and no seeding (without seeding any class usage). For this evaluation, we use a benchmark of 122 hard-to-reproduce crashes stemming from six open-source projects. Our results indicate that behavioural model seeding outperforms both test seeding and no seeding by a minimum of 6% without any notable negative impact on efficiency. ...
EvoCrash is a recent search-based approach to generate a test case that reproduces reported crashes. The search is guided by a fitness function that uses a weighted sum scalarization to combine three different heuristics: (i) code coverage, (ii) crash coverage and (iii) stack trace similarity. In this study, we propose and investigate two alternatives to the weighted sum scalarization: (i) the simple sum scalarization and (ii) the multi-objectivization, which decomposes the fitness function into several optimization objectives as an attempt to increase test case diversity. We implemented the three alternative optimizations as an extension of EvoSuite, a popular search-based unit test generator, and applied them on 33 real-world crashes. Our results indicate that for complex crashes the weighted sum reduces the test case generation time, compared to the simple sum, while for simpler crashes the effect is the opposite. Similarly, for complex crashes, multi-objectivization reduces test generation time compared to optimizing with the weighted sum; we also observe one crash that can be replicated only by multi-objectivization. Through our manual analysis, we found out that when optimizing the original weighted function gets trapped in local optima, optimization for decomposed objectives improves the search for crash reproduction. Generally, while multi-objectivization is under-explored, our results are promising and encourage further investigations of the approach. ...