Can Large Language Models reason? Investigating Open-Source Cryptographic Reasoning

None, None

Can Large Language Models reason? Investigating Open-Source Cryptographic Reasoning

Bachelor Thesis (2026)

Author(s)

A. Taneva (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Z. Erkin – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

M.J.G. Olsthoorn – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

LLM Cybersecurity CTF

To reference this document use

https://resolver.tudelft.nl/uuid:37f889fa-1a26-4015-a160-f9aff2fe58e4

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

30-01-2026

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

38

Abstract

Large language models (LLMs) have shown re- markable performance on mathematical competi- tions (AIME), and recently on the AICrypto bench- mark. AICrypto has tested some of the best commercially-available models on capture-the- flag (CTF)-style cryptography challenges across multiple-choice challenges, proof, and open-ended questions. While they do well on the first 2 cat- egories, the LLMs struggle to solve open-ended questions where advanced mathematical reasoning and creativity are required. This research follows an approach similar to the AICrypto benchmark, testing open-source LLMs and their ability to rea- son. Instead of assigning simple pass/fail scores, the LLM is evaluated qualitatively based on the logic it follows. We aim to demystify the black- box working of commercial LLMs, and potentially lead to developing an open-source framework for solving CTF challenges. The tests are performed on the Qwen3 32B LLM using an agentic ReAct framework. The most common failure modes are discussed, as well their causal factors.

No files available

Metadata only record. There are no files for this record.