Can Timing Localize Agent Failures?

None, None

Can Timing Localize Agent Failures?

Incorperating a temporal dimension into spectrum-based fault localization for LLM multi-agent systems

Bachelor Thesis (2026)

Author(s)

H.M. Schouwenaars (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

B. Özkan – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A. Panichella – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Z. Seyedghorban – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

M.T.J. Spaan – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Fault Localization LLM LLM agents

To reference this document use

https://resolver.tudelft.nl/uuid:88da706f-3d41-4d60-9fd8-3bd2cc4a35df

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

26-06-2026

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

9

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Large language models are fallible and indeter- ministic, which makes fault localization harder than in traditional deterministic software. It gets harder still in LLM-based Multi Agent Sys- tems (LLM-MAS), where a fault can be intro- duced by any agent at any point, rarely repro- duces, and tends to propagate throughout the rest of an execution. Spectrum-based fault lo- calization (SBFL) has worked well on classical software, and prior work has applied it to LLM- MAS with promising but substandard accuracy. This research proposes a new spectrum, one based on temporality. Each spectrum element is defined by an agent, an action, and the temporal window in which the action occurs. These win- dows can take many shapes; we compare four: relative static, absolute static, sliding, and dy- namic windowing. The spectra are built from a new dataset of HyperAgent traces collected over three SWE-Bench Verified tasks, with the ground-truth faults and message labels assigned by an LLM-as-a-judge, and each windowing is evaluated by top-k accuracy. Relative static win- dowing with a high partition count performed best. The other strategies showed little consis- tent improvement over the baseline. Temporal ordering can therefore be correlated with faults, but the agent-action-window spectrum does not yet capture them reliably. A richer, denser spec- trum will be needed to reach a workable level of accuracy.

Files

Heinschouwenaars_RP_finalpaper... (pdf)

(pdf | 1.29 Mb)

License info not available