YP

Y.S. Pachedzhiev

info

Please Note

1 records found

Scope-Guided LLM Judging for Responsible-Agent and Failure-Step Attribution

LLM-based multi-agent systems often produce long execution traces with agent messages, tool outputs, intermediate decisions, and final responses. When a task fails, the failed outcome usually does not show which agent caused the failure or which earlier step introduced it. This paper treats fault localization as failure attribution: predicting the responsible agent and the decisive failure step in a failed multi-agent trace. It compares a direct whole-trace baseline with two-stage scope-guided judging on the Hand-Crafted subset of the Who\&When benchmark. In the direct baseline, one LLM judge receives the full trace and predicts both labels. In the scope-guided methods, a first-stage selector chooses a small set of reference steps, and the same final judge predicts the labels from the full trace plus those selected steps. The experiments show that scope guidance is not generally beneficial. Generic LLM scope selection improves selected-scope Hit@5 over random selection, but does not improve final attribution over direct whole-trace judging. The source-candidate-pool selector gives the best responsible-agent, failure-step, and joint attribution accuracy, but the improvement is modest and requires more than four times the mean token cost of direct whole-trace judging. Overall, scope guidance helps only when the selected steps point the judge toward earlier source-level evidence. Direct whole-trace judging remains a strong lower-cost baseline. ...