Voice Based Interfaces for Supermarket robots using Large Language Models

None, None

Voice Based Interfaces for Supermarket robots using Large Language Models

Master Thesis (2024)

Author(s)

C. NANDKUMAR (TU Delft - Mechanical Engineering)

Contributor(s)

L. Peternel – Mentor (TU Delft - Human-Robot Interaction)

Joost Winter – Coach (TU Delft - Human-Robot Interaction)

Maria Soledad Pera – Coach (TU Delft - Web Information Systems)

Faculty

Mechanical Engineering

Robotics Large Language Models (LLMs) Speech recognition Voice interfaces

To reference this document use:

https://resolver.tudelft.nl/uuid:25f194f3-9370-4b7f-b3f0-125452682f5a

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

17-04-2024

Awarding Institution

Delft University of Technology

Programme

['Mechanical Engineering | Vehicle Engineering | Cognitive Robotics']

Faculty

Mechanical Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This thesis presents the design and evaluation of a comprehensive system for developing voice-based interfaces to support users in supermarkets. These interfaces enable customers to convey their needs across both generic and specific queries. While current state-of-the-art systems like GPTs by OpenAI are easily accessible and adaptable, featuring low-code deployment with options for functional integration, they still face challenges such as increased response times and limitations in strategic control for tailored use-case and cost optimisation. Motivated by the goal of crafting inclusive, personalised, and efficient conversational agents, this study advances on three fronts: 1) a comparative analysis of four popular off-the-shelf speech recognition technologies to identify the most accurate model for different genders (male/female) and languages (English/Dutch); 2) an assessment of the effects of personalised recommendations versus generic responses, using a blindfolded, counterbalanced within-subject experiment; and 3) the development and evaluation of a novel multi-LLM supermarket chatbot framework, comparing its performance with a specialized GPT model powered by the GPT-4 Turbo, using the Artificial Social Agent Questionnaire (ASAQ) in a counterbalanced within-subjects experiment and qualitative participant feedback. Our find-ings reveal that OpenAI’s Whisper leads in speech recognition accuracy across genders and languages, users significantly prefer personalised chatbots over the non-personalised counterparts and that our proposed multi-LLM chatbot architecture outperformed the benchmarked GPT model across all 13 measured criteria, including statistically significant improvements in four key areas: performance, user satisfaction, user-agent partnership, and self-image enhancement. The thesis concludes by presenting a simple method for supermarket robot navigation by mapping the final chatbot response to correct shelf numbers towards which the robot can plan sequential visits. This later enables effective use of low-level perception, motion planning, and control capabilities for product retrieval and collection. We hope this work encourages more efforts into using multiple, specialised smaller models instead of always relying on a single powerful model.

Files

MSc_Thesis_3_.pdf

(pdf | 4.55 Mb)

License info not available