Towards Learning from Implicit Human Reward

None, None; None, None; None, None; None, None

Towards Learning from Implicit Human Reward

(Extended Abstract)

Abstract (2016)

Author(s)

G. Li (Universiteit van Amsterdam, Ocean University of China)

H. Dibeklioglu (TU Delft - Pattern Recognition and Bioinformatics)

S Whiteson (University of Oxford)

Hayley Hung (TU Delft - Pattern Recognition and Bioinformatics)

Research Group

Pattern Recognition and Bioinformatics

Reinforcement learning Human agent interaction

To reference this document use:

https://resolver.tudelft.nl/uuid:cc113785-e689-4519-9d0c-dc3dfaf49acc

More Info

expand_more

Publication Year

2016

Language

English

Research Group

Pattern Recognition and Bioinformatics

Pages (from-to)

1353-1354

Abstract

The TAMER framework provides a way for agents to learn to solve tasks using human-generated rewards. Previous research showed that humans give copious feedback early in training but very sparsely thereafter and that an agent's competitive feedback --- informing the trainer about its performance relative to other trainers --- can greatly affect the trainer's engagement and the agent's learning. In this paper, we present the first large-scale study of TAMER, involving 561 subjects, which investigates the effect of the agent's competitive feedback in a new setting as well as the potential for learning from trainers' facial expressions. Our results show for the first time that a TAMER agent can successfully learn to play Infinite Mario, a challenging reinforcement-learning benchmark problem. In addition, our study supports prior results demonstrating the importance of bi-directional feedback and competitive elements in the training interface. Finally, our results shed light on the potential for using trainers' facial expressions as reward signals, as well as the role of age and gender in trainer behavior and agent performance.

No files available

Metadata only record. There are no files for this record.