A large-scale study of agents learning from human reward (Extended abstract)

Conference paper (2015)

Authors

G. Li

H.S. Hung

S. Whiteson

Department

Intelligent Systems () (TU Delft)

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:2f876937-22d3-42fc-8bd4-450a2f8c67ee

Published Date

05-05-2015

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Source:

MSDM 2015: AAMAS Workshop on Multiagent Sequential Decision Making Under Uncertainty, Istanbul, Turkey, 4-5 May, 2015

Faculty

Electrical Engineering, Mathematics and Computer Science

Department

Intelligent Systems

Abstract

The TAMER framework, which provides a way for agents to learn to solve tasks using human-generated rewards, has been examined in several small-scale studies, each with a few dozen subjects. In this paper, we present the results of the first large-scale study of TAMER, which was performed at the NEMO science museum in Amsterdam and involved 561 subjects. Our results show for the first time that an agent using TAMER can successfully learn to play Infinite Mario, a challenging reinforcement-learning benchmark problem based on the popular video game, given feedback from both adult (N = 209) and child (N = 352) trainers. In addition, our study supports prior studies demonstrating the importance of bidirectional feedback and competitive elements in the training interface. Finally, our results also shed light on the potential for using trainers’ facial expressions as a reward signal, as well as the role of age and gender in trainer behavior and agent performance.

Files

328181.pdf

(pdf | 0.241 Mb)