Navigating Task Complexity: Distributed Cognition Between New Users and Different LLM Architectures

Master Thesis (2026)
Author(s)

T. Dicke (TU Delft - Mechanical Engineering)

Contributor(s)

Y.B. Eisma – Mentor (TU Delft - Human-Robot Interaction)

D. Dodou – Mentor (TU Delft - Medical Instruments & Bio-Inspired Technology)

A. Zgonnikov – Graduation committee member (TU Delft - Human-Robot Interaction)

Faculty
Mechanical Engineering
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
24-02-2026
Awarding Institution
Delft University of Technology
Programme
Mechanical Engineering, Vehicle Engineering, Cognitive Robotics
Faculty
Mechanical Engineering
Downloads counter
14
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Large Language Models (LLMs) promise intuitive robot control through natural language, yet the gap between vague human intent and safe physical execution remains significant. This thesis investigates how the distribution of planning responsibility relates to the reasoning architecture of the model. In a study involving 226 participants who were new users in LLM-based robot prompting, a non-reasoning model (Gemini 2.0 Flash-Lite) and a reasoning model (Gemini 2.5 Pro) were compared on a baseline navigation task, followed by an evaluation of the reasoning model across tasks of increasing logical complexity. Results indicate a clear divergence in safety profiles: non-reasoning models showed more collision-prone goal-seeking behavior, whereas reasoning models demonstrated stricter adherence to the safety constraints in the system prompt, preferring to refuse a request rather than generating an unsafe plan when the task exceeded the model’s capabilities. However, even reasoning models showed declining performance in high-complexity tasks, which hyperparameter tuning (temperature/tokens) did not resolve. Analysis of user interaction reveals that effective prompting is less about linguistic precision and more about "distributed cognition": while models can autonomously plan simple tasks, complex scenarios require the human to reclaim the task planning effort and provide low-level guidance to reduce the solution space. These findings suggest that safe language-driven robotics depends on a dynamic partnership where the distribution of task planning effort shifts based on task difficulty and the capabilities of the specific LLM.

Files

License info not available