Simultaneous learning of objective function and policy from interactive teaching with corrective feedback

None, None; None, None

Simultaneous learning of objective function and policy from interactive teaching with corrective feedback

Conference Paper (2019)

Author(s)

Carlos Celemin (TU Delft - Learning & Autonomous Control)

J. Kober (TU Delft - Learning & Autonomous Control)

Research Group

Learning & Autonomous Control

Copyright

DOI related publication

https://doi.org/10.1109/AIM.2019.8868805

To reference this document use:

https://resolver.tudelft.nl/uuid:9c683b97-1924-4c37-87e9-5f3824b662c8

More Info

expand_more

Publication Year

2019

Language

English

Copyright

Research Group

Learning & Autonomous Control

Pages (from-to)

726-732

ISBN (electronic)

978-1-7281-2493-3

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Some imitation learning approaches rely on Inverse Reinforcement Learning (IRL) methods, to decode and generalize implicit goals given by expert demonstrations. The study of IRL normally has the assumption of available expert demonstrations, which is not always possible. There are Machine Learning methods that allow non-expert teachers to guide robots to learn complex policies, which eventually fills the expert dependencies of IRL. This work introduces an approach for simultaneously teaching robot policies and objective functions from vague human corrective feedback. The main goal is to generalize the insights that a non-expert human teacher provides to the robot, to unseen conditions, without further need for human effort in the complementary training process. We present an experimental validation of the introduced approach for transfer learning of knowledge to scenarios not considered while the non-expert was teaching. Experimental results show that the learned reward functions obtain similar performance in RL processes compared to engineered reward functions used as baseline, both in simulated and real environments.

Files

Simultaneous_Learning_of_Objec... (pdf)

(pdf | 2.07 Mb)

- Embargo expired in 17-04-2020

License info not available