Is human-in-the-loop reinforcement learning enhanced if the robot emotes its learning progress?

An experimental study

Master thesis (2023)

Authors

F.C.J. Lijcklama à Nijeholt Mechanical Engineering

Contributors

Joost Broekens Universiteit Leiden (supervisor 1)

J.C.F. de Winter Human-Robot Interaction - Mechanical, Maritime and Materials Engineering (supervisor 1)

D. Dodou Medical Instruments & Bio-Inspired Technology - Mechanical, Maritime and Materials Engineering (supervisor 2)

Faculty

Mechanical Engineering

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:3ca0fdd3-6743-472c-9cdc-6c08a92d4ed2

Published Date

08-06-2023

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Mechanical Engineering

Abstract

As technology continues to evolve at a rapid pace, robots are becoming an increasingly common sight in our daily lives.
Robots that work with humans need to adapt to a variety of users and tasks, and learn to optimise their behaviour. For non-specialist users to interact with such robots, the robot's learning process needs to be transparent through its behaviour. Reinforcement Learning (RL) is a promising learning method to achieve this adaptability. However, the behaviour generated by RL is not inherently transparent because of the exploration/exploitation trade-off that is needed to optimise a policy for a specific task.

A RL algorithm is Temporal Difference (TD) learning. In TD learning, the algorithm updates a Q-table to keep track of Q-values. Q-values represent the expected future rewards that the agent (the actor that decides what action to take) can receive by taking a specific action in a certain state. Calculating the Q-values involves a value called the Temporal Difference, which is the difference between the current Q-value with the received reward added and the Q-value for the future state and chosen action.

Emotions are a natural way of communicating intent and situational appraisal for humans. In this study, emotional expressions based on Temporal Differences were implemented as a means to increase the transparency of a robot's learning progress. The effects on the robot's learning progress, learning result, and user experience were analysed.

A between-subject experiment with 61 participants on the following three robot modes was performed: no emotions, simulated emotions, and simulated emotions with matching attribution (see Table ef{table:robotModes}). The simulated emotions are hope, fear, joy, and distress, which were expressed by a humanoid robot. The robot mode with simulated emotions and matching attributions would explain for what task it was feeling hope or fear. The task was a simple task where a human teacher had to help a humanoid robot to learn to express three different colours based on human commands.

The results demonstrate minimal differences between these three conditions. This means that for simple tasks, emotional expressions grounded in RL do not have a significant effect, and thus do not help nor hurt. The findings are discussed, and it is proposed that emotion simulation is beneficial for tasks that are more complex, afford some robot autonomy, and for which the emotion is informative about how the user should influence the robot's actions to the benefit of the robot's policy.

Files

Is_human_in_the_loop_reinforce... (.pdf)

(.pdf | 8.42 Mb)

Robot_expression_of_fear.mp4

(.mp4 | 0.348 Mb)

Robot_expression_of_distress.m... (.mp4)

(.mp4 | 0.21 Mb)

Robot_expression_of_hope.mp4

(.mp4 | 0.286 Mb)

Robot_expression_of_joy.mp4

(.mp4 | 0.422 Mb)