Enhancing supermarket robot interaction: an equitable multi-level LLM conversational interface for handling diverse customer intents

None, None; None, None

Enhancing supermarket robot interaction: an equitable multi-level LLM conversational interface for handling diverse customer intents

Journal Article (2025)

Author(s)

C. Nandkumar (Student TU Delft)

L. Peternel (TU Delft - Human-Robot Interaction)

Research Group

Human-Robot Interaction

DOI related publication

https://doi.org/10.3389/frobt.2025.1576348

Speech recognition Large language models Robotics Questionnaires Voice interface

To reference this document use:

https://resolver.tudelft.nl/uuid:45b629d6-9e60-4d2c-840e-c82122039f63

More Info

expand_more

Publication Year

2025

Language

English

Research Group

Human-Robot Interaction

Volume number

12

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper presents the design and evaluation of a comprehensive system to develop voice-based interfaces to support users in supermarkets. These interfaces enable shoppers to convey their needs through both generic and specific queries. Although customisable state-of-the-art systems like GPTs from OpenAI are easily accessible and adaptable, featuring low-code deployment with options for functional integration, they still face challenges such as increased response times and limitations in strategic control for tailored use cases and cost optimization. Motivated by the goal of crafting equitable and efficient conversational agents with a touch of personalisation, this study advances on two fronts: 1) a comparative analysis of four popular off-the-shelf speech recognition technologies to identify the most accurate model for different genders (male/female) and languages (English/Dutch) and 2) the development and evaluation of a novel multi-LLM supermarket chatbot framework, comparing its performance with a specialized GPT model powered by the GPT-4 Turbo, using the Artificial Social Agent Questionnaire (ASAQ) and qualitative participant feedback. Our findings reveal that OpenAI’s Whisper leads in speech recognition accuracy between genders and languages and that our proposed multi-LLM chatbot architecture, which outperformed the benchmarked GPT model in performance, user satisfaction, user-agent partnership, and self-image enhancement, achieved statistical significance in these four key areas out of the 13 evaluated aspects that all showed improvements. The paper concludes with a simple method for supermarket robot navigation by mapping the final chatbot response to the correct shelf numbers to which the robot can plan sequential visits. Later, this enables the effective use of low-level perception, motion planning, and control capabilities for product retrieval and collection. We hope that this work encourages more efforts to use multiple specialized smaller models instead of always relying on a single powerful model.

Files

Frobt-1-1576348.pdf

(pdf | 29 Mb)