The Impact of Context Window Constraints on ReAct Agents in Cryptographic CTF Challenges

None, None

The Impact of Context Window Constraints on ReAct Agents in Cryptographic CTF Challenges

Performance, Efficiency, and Failure Modes of ReAct Agents under Context Constraints

Bachelor Thesis (2026)

Author(s)

Y.B. Köse (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Z. Erkin – Mentor (TU Delft - Cyber Security)

M.J.G. Olsthoorn – Graduation committee member (TU Delft - Software Engineering)

Large Language Models LLM Context-Window CTF Reasoning Stability Context Window Constraints

To reference this document use

https://resolver.tudelft.nl/uuid:8d97869d-7da4-4f38-803f-2c0e3e9603fe

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

30-01-2026

Awarding Institution

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Downloads counter

28

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

While Large Language Model (LLM) agents are increasingly capable in specialized domains, the impact of Context Window Constraints on reasoning stability remains under-explored. In this paper we investigate how strictly controlling context size influences agent performance in solving multi-step cryptographic Capture The Flag (CTF) challenges. We adapted an agentic environment where models attempt to solve crypto challenges under four distinct context length constraints (8k–64k), managed dynamically as a tunable hyper parameter. Our results reveal a non-linear performance curve with a clear saturation threshold between 16k and 32k to-kens, beyond which additional context offers negligible benefit. We observe a distinct shift in failure modes: tight constraints lead to context starvation (hallucination), while unconstrained windows allow the accumulation of error traces, where the presence of prior failed attempts biases the agent toward repetitive action loops which in turn inflates its operating costs and decreases operational efficiency. These findings demonstrate that unconstrained context is not only expensive, but leads to regression in the reasoning stability within logic-heavy domains. We conclude that future bench-marks must explicitly distinguish between capability failures and context-induced failures. Further-more, our findings suggest that engineering strategies should prioritize dynamic context management over the reliance on static, maximized windows.

Files

Ybkose_rp_final_paper.pdf

(pdf | 1.19 Mb)

License info not available