Can we use LLMs for abstraction in MDPs?

None, None

Can we use LLMs for abstraction in MDPs?

A deep dive into the potential and limitations of LLMs

Master Thesis (2025)

Author(s)

D. Lentschig (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

F.A. Oliehoek – Mentor (TU Delft - Sequential Decision Making)

J. He – Mentor (TU Delft - Sequential Decision Making)

P.K. Murukannaiah – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty

Electrical Engineering, Mathematics and Computer Science

LLM Decision-making Abstraction MDP Homomorphism LLama3 Deepseek-R1

To reference this document use:

https://resolver.tudelft.nl/uuid:ba6c9add-eec2-492c-ba14-c07f3ea94f96

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

22-10-2025

Awarding Institution

Delft University of Technology

Programme

['Electrical Engineering | Embedded Systems']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This thesis explores whether Large Language Models (LLMs) can generate abstractions in Markovian Decision Processes (MDPs) to reduce complexity in planning with Monte Carlo Tree Search (MCTS). A complete pipeline was developed to extract and validate cluster-based abstractions from LLMs. The pipeline combines modular prompt engineering, post-processing, and evaluation through both structural similarity and performance metrics. Experiments in gridworld environments show that Deepseek-R1 models consistently outperform LLaMA models, with architecture and training proving more important than parameter size. Structured prompts, especially those using JSON representation and rationale-driven responses, significantly improved abstraction quality. While LLMs can approximate, and sometimes even find the ideal abstractions in simple environments, performance deteriorates in larger or less regular domains. These findings highlight both the potential and current limitations of LLM-based abstraction, and suggest directions for future research, including more complex environments, richer abstraction types, and advanced prompting strategies.

Files

MSc_Thesis_Dennis_Lentschig.pd... (pdf)

(pdf | 3.71 Mb)

License info not available