A fast hybrid reinforcement learning framework with human corrective feedback

None, None; None, None; None, None

A fast hybrid reinforcement learning framework with human corrective feedback

Journal Article (2018)

Author(s)

Carlos Celemin (TU Delft - Mechanical Engineering, Universidad de Santiago de Chile)

Javier Ruiz-del-Solar (Universidad de Santiago de Chile)

Jens Kober (TU Delft - Mechanical Engineering)

Research Group

Learning & Autonomous Control

Reinforcement learning Learning from demonstration Interactive machine learning Policy search

DOI related publication

https://doi.org/10.1007/s10514-018-9786-6 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:753b8fad-e98f-4c2d-b959-67ec56fe4bc1

More Info

expand_more

Publication Year

2018

Language

English

Research Group

Learning & Autonomous Control

Issue number

5

Volume number

43 (2019)

Pages (from-to)

1173-1186

Downloads counter

164

Collections

Institutional Repository

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Reinforcement Learning agents can be supported by feedback from human teachers in the learning loop that guides the learning process. In this work we propose two hybrid strategies of Policy Search Reinforcement Learning and Interactive Machine Learning that benefit from both sources of information, the cost function and the human corrective feedback, for accelerating the convergence and improving the final performance of the learning process. Experiments with simulated and real systems of balancing tasks and a 3 DoF robot arm validate the advantages of the proposed learning strategies: (i) they speed up the convergence of the learning process between 3 and 30 times, saving considerable time during the agent adaptation, and (ii) they allow including non-expert feedback because they have low sensibility to erroneous human advice.

Files

Celemin2018_Article_AFastHybri... (pdf)

(pdf | 2.41 Mb)