Navigating Task Complexity: Distributed Cognition Between New Users and Different LLM Architectures

None, None

Navigating Task Complexity: Distributed Cognition Between New Users and Different LLM Architectures

Master Thesis (2026)

Author(s)

T. Dicke (TU Delft - Mechanical Engineering)

Contributor(s)

Y.B. Eisma – Mentor (TU Delft - Human-Robot Interaction)

D. Dodou – Mentor (TU Delft - Medical Instruments & Bio-Inspired Technology)

A. Zgonnikov – Graduation committee member (TU Delft - Human-Robot Interaction)

Faculty

Mechanical Engineering

Large Language Models Robotics Human-in-the-loop Robot navigation Distributed cognition Task complexity Prompting Reasoning models

To reference this document use:

https://resolver.tudelft.nl/uuid:50a7a0b5-8652-4a7a-8edb-8b777604d8fe

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

24-02-2026

Awarding Institution

Delft University of Technology

Programme

Mechanical Engineering, Vehicle Engineering, Cognitive Robotics

Faculty

Mechanical Engineering

Downloads counter

14

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Large Language Models (LLMs) promise intuitive robot control through natural language, yet the gap between vague human intent and safe physical execution remains significant. This thesis investigates how the distribution of planning responsibility relates to the reasoning architecture of the model. In a study involving 226 participants who were new users in LLM-based robot prompting, a non-reasoning model (Gemini 2.0 Flash-Lite) and a reasoning model (Gemini 2.5 Pro) were compared on a baseline navigation task, followed by an evaluation of the reasoning model across tasks of increasing logical complexity. Results indicate a clear divergence in safety profiles: non-reasoning models showed more collision-prone goal-seeking behavior, whereas reasoning models demonstrated stricter adherence to the safety constraints in the system prompt, preferring to refuse a request rather than generating an unsafe plan when the task exceeded the model’s capabilities. However, even reasoning models showed declining performance in high-complexity tasks, which hyperparameter tuning (temperature/tokens) did not resolve. Analysis of user interaction reveals that effective prompting is less about linguistic precision and more about "distributed cognition": while models can autonomously plan simple tasks, complex scenarios require the human to reclaim the task planning effort and provide low-level guidance to reduce the solution space. These findings suggest that safe language-driven robotics depends on a dynamic partnership where the distribution of task planning effort shifts based on task difficulty and the capabilities of the specific LLM.

Files

Navigating_Task_Complexity_Dis... (pdf)

(pdf | 10.3 Mb)

License info not available