Navigating Task Complexity: Distributed Cognition Between New Users and Different LLM Architectures
T. Dicke (TU Delft - Mechanical Engineering)
Y.B. Eisma – Mentor (TU Delft - Human-Robot Interaction)
D. Dodou – Mentor (TU Delft - Medical Instruments & Bio-Inspired Technology)
A. Zgonnikov – Graduation committee member (TU Delft - Human-Robot Interaction)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Large Language Models (LLMs) promise intuitive robot control through natural language, yet the gap between vague human intent and safe physical execution remains significant. This thesis investigates how the distribution of planning responsibility relates to the reasoning architecture of the model. In a study involving 226 participants who were new users in LLM-based robot prompting, a non-reasoning model (Gemini 2.0 Flash-Lite) and a reasoning model (Gemini 2.5 Pro) were compared on a baseline navigation task, followed by an evaluation of the reasoning model across tasks of increasing logical complexity. Results indicate a clear divergence in safety profiles: non-reasoning models showed more collision-prone goal-seeking behavior, whereas reasoning models demonstrated stricter adherence to the safety constraints in the system prompt, preferring to refuse a request rather than generating an unsafe plan when the task exceeded the model’s capabilities. However, even reasoning models showed declining performance in high-complexity tasks, which hyperparameter tuning (temperature/tokens) did not resolve. Analysis of user interaction reveals that effective prompting is less about linguistic precision and more about "distributed cognition": while models can autonomously plan simple tasks, complex scenarios require the human to reclaim the task planning effort and provide low-level guidance to reduce the solution space. These findings suggest that safe language-driven robotics depends on a dynamic partnership where the distribution of task planning effort shifts based on task difficulty and the capabilities of the specific LLM.