Can Large Language Models reason? Investigating Open-Source Cryptographic Reasoning
A. Taneva (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Z. Erkin – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
M.J.G. Olsthoorn – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Abstract
Large language models (LLMs) have shown re- markable performance on mathematical competi- tions (AIME), and recently on the AICrypto bench- mark. AICrypto has tested some of the best commercially-available models on capture-the- flag (CTF)-style cryptography challenges across multiple-choice challenges, proof, and open-ended questions. While they do well on the first 2 cat- egories, the LLMs struggle to solve open-ended questions where advanced mathematical reasoning and creativity are required. This research follows an approach similar to the AICrypto benchmark, testing open-source LLMs and their ability to rea- son. Instead of assigning simple pass/fail scores, the LLM is evaluated qualitatively based on the logic it follows. We aim to demystify the black- box working of commercial LLMs, and potentially lead to developing an open-source framework for solving CTF challenges. The tests are performed on the Qwen3 32B LLM using an agentic ReAct framework. The most common failure modes are discussed, as well their causal factors.
No files available
Metadata only record. There are no files for this record.