The Impact of Context Window Constraints on ReAct Agents in Cryptographic CTF Challenges

Performance, Efficiency, and Failure Modes of ReAct Agents under Context Constraints

Bachelor Thesis (2026)
Author(s)

Y.B. Köse (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Z. Erkin – Mentor (TU Delft - Cyber Security)

M.J.G. Olsthoorn – Graduation committee member (TU Delft - Software Engineering)

More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
30-01-2026
Awarding Institution
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Downloads counter
28
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

While Large Language Model (LLM) agents are increasingly capable in specialized domains, the impact of Context Window Constraints on reasoning stability remains under-explored. In this paper we investigate how strictly controlling context size influences agent performance in solving multi-step cryptographic Capture The Flag (CTF) challenges. We adapted an agentic environment where models attempt to solve crypto challenges under four distinct context length constraints (8k–64k), managed dynamically as a tunable hyper parameter. Our results reveal a non-linear performance curve with a clear saturation threshold between 16k and 32k to-kens, beyond which additional context offers negligible benefit. We observe a distinct shift in failure modes: tight constraints lead to context starvation (hallucination), while unconstrained windows allow the accumulation of error traces, where the presence of prior failed attempts biases the agent toward repetitive action loops which in turn inflates its operating costs and decreases operational efficiency. These findings demonstrate that unconstrained context is not only expensive, but leads to regression in the reasoning stability within logic-heavy domains. We conclude that future bench-marks must explicitly distinguish between capability failures and context-induced failures. Further-more, our findings suggest that engineering strategies should prioritize dynamic context management over the reliance on static, maximized windows.

Files

Ybkose_rp_final_paper.pdf
(pdf | 1.19 Mb)
License info not available