H. Galitianu
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
1 records found
1
Robust test assertions are critical for verifying deep semantic behavior, but their automated generation remains a primary bottleneck in software testing. Automated test case generation approaches often rely on implicit oracles or regression checks that miss semantic failures. Large language models (LLMs) can synthesize meaningful assertions, but single-pass prompting frequently produces uncompilable or failing code. We propose a multi-agent workflow for Java test assertion generation consisting of code comprehension, test objective planning, and assertion generation. The workflow extracts mutation-relevant variable manifests, structures high-level testing plans, compiles and executes the generated test candidates, and iteratively refines assertions using mutation-testing feedback from PITest to optimize mutation quality before final selection.
We evaluate the approach on 112 focal tests from twilio-java and liqp. Compared with static prompting, agentic configurations substantially improve reliability, increasing the percentage of valid runs (compilable, executable, and passing tests) from 58.1% to 84.8%. Relative to the human baseline, the agentic configuration raises the average Test Strength (the ratio of killed mutants to covered mutants) from 45.6% to approximately 56%. Our evaluation shows that while execution feedback significantly improves reliability and observed Test Strength, combining all agentic components does not yield the best computational trade-off. ...
We evaluate the approach on 112 focal tests from twilio-java and liqp. Compared with static prompting, agentic configurations substantially improve reliability, increasing the percentage of valid runs (compilable, executable, and passing tests) from 58.1% to 84.8%. Relative to the human baseline, the agentic configuration raises the average Test Strength (the ratio of killed mutants to covered mutants) from 45.6% to approximately 56%. Our evaluation shows that while execution feedback significantly improves reliability and observed Test Strength, combining all agentic components does not yield the best computational trade-off. ...
Robust test assertions are critical for verifying deep semantic behavior, but their automated generation remains a primary bottleneck in software testing. Automated test case generation approaches often rely on implicit oracles or regression checks that miss semantic failures. Large language models (LLMs) can synthesize meaningful assertions, but single-pass prompting frequently produces uncompilable or failing code. We propose a multi-agent workflow for Java test assertion generation consisting of code comprehension, test objective planning, and assertion generation. The workflow extracts mutation-relevant variable manifests, structures high-level testing plans, compiles and executes the generated test candidates, and iteratively refines assertions using mutation-testing feedback from PITest to optimize mutation quality before final selection.
We evaluate the approach on 112 focal tests from twilio-java and liqp. Compared with static prompting, agentic configurations substantially improve reliability, increasing the percentage of valid runs (compilable, executable, and passing tests) from 58.1% to 84.8%. Relative to the human baseline, the agentic configuration raises the average Test Strength (the ratio of killed mutants to covered mutants) from 45.6% to approximately 56%. Our evaluation shows that while execution feedback significantly improves reliability and observed Test Strength, combining all agentic components does not yield the best computational trade-off.
We evaluate the approach on 112 focal tests from twilio-java and liqp. Compared with static prompting, agentic configurations substantially improve reliability, increasing the percentage of valid runs (compilable, executable, and passing tests) from 58.1% to 84.8%. Relative to the human baseline, the agentic configuration raises the average Test Strength (the ratio of killed mutants to covered mutants) from 45.6% to approximately 56%. Our evaluation shows that while execution feedback significantly improves reliability and observed Test Strength, combining all agentic components does not yield the best computational trade-off.