Can Large Language Models reason? Investigating Open-Source Cryptographic Reasoning

Bachelor Thesis (2026)
Author(s)

A. Taneva (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Z. Erkin – Mentor (TU Delft - Cyber Security)

M.J.G. Olsthoorn – Graduation committee member (TU Delft - Software Engineering)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
30-01-2026
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
16
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Large language models (LLMs) have shown re- markable performance on mathematical competi- tions (AIME), and recently on the AICrypto bench- mark. AICrypto has tested some of the best commercially-available models on capture-the- flag (CTF)-style cryptography challenges across multiple-choice challenges, proof, and open-ended questions. While they do well on the first 2 cat- egories, the LLMs struggle to solve open-ended questions where advanced mathematical reasoning and creativity are required. This research follows an approach similar to the AICrypto benchmark, testing open-source LLMs and their ability to rea- son. Instead of assigning simple pass/fail scores, the LLM is evaluated qualitatively based on the logic it follows. We aim to demystify the black- box working of commercial LLMs, and potentially lead to developing an open-source framework for solving CTF challenges. The tests are performed on the Qwen3 32B LLM using an agentic ReAct framework. The most common failure modes are discussed, as well their causal factors.

Files

License info not available