Higher Fault Detection Through Novel Density Estimators in Unit Test Generation
Annibale Panichella (TU Delft - Software Engineering)
Mitchell Olsthoorn (TU Delft - Software Engineering)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Many-objective evolutionary algorithms (MOEAs) have been applied in the software testing literature to automate the generation of test cases. While previous studies confirmed the superiority of MOEAs over other algorithms, one of the open challenges is maintaining a strong selective pressure considering the large number of objectives to optimize (coverage targets). This paper investigates four density estimators as a substitute for the traditional crowding distance. In particular, we consider two estimators previously proposed in the evolutionary computation community, namely the subvector-dominance assignment (SD) and the epsilon-dominance assignment (ED). We further propose two novel density estimators specific to test case generation, namely the token-based density estimator (TDE) and the path-based density estimator (PDE). Based on the CodeBERT model tokenizer, TDE uses natural language processing to measure the semantic distance between test cases. PDE, on the other hand, considers the distance between the source-code paths executed by the test cases. We evaluate these density estimators within EvoSuite on 100 non-trivial Java classes from the SF110 benchmark. Our results show that the proposed path-based density estimator (PDE) outperforms all other density estimators in enhancing mutation scores. It increases mutation scores by 4.26 % on average (with a max of over 60%) to the traditional crowding distance.