Abstraction-Guided Policy Recovery from Expert Demonstrations

None, None; None, None; None, None

Abstraction-Guided Policy Recovery from Expert Demonstrations

Conference Paper (2020)

Author(s)

Canmanie T. Ponnambalam (TU Delft - Algorithmics)

Frans Oliehoek (TU Delft - Interactive Intelligence)

M.T.J. Spaan (TU Delft - Algorithmics)

Research Group

Algorithmics

Copyright

To reference this document use:

https://resolver.tudelft.nl/uuid:66120ffa-096a-4cbf-bd52-cc567452dcfe

More Info

expand_more

Publication Year

2020

Language

English

Copyright

Research Group

Algorithmics

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The goal in behavior cloning is to extract meaningful information from expertdemonstrations and reproduce the same behavior autonomously. However, theavailable data is unlikely to exhaustively cover the potential problem space. As aresult, the quality of automated decision making is compromised without elegantways to handle the encountering of out-of-distribution states that might occur dueto unforeseen events in the environment. Our novel approach RECO uses only theoffline data available to recover a behavioral cloning agent from unknown states.Given expert trajectories, RECO learns both an imitation policy and recoverypolicy. Our contribution is a method for learning this recovery policy that steersthe agent back to the trajectories in the data set from unknown states. Whilethere is, per definition, no data available to learn the recovery policy, we exploitabstractions to generalize beyond the available data thus overcoming this problem.In a tabular domain, we show how our method results in drastically fewer calls to ahuman supervisor without compromising solution quality and with few trajectoriesprovided by an expert. We further introduce a continuous adaptation of RECO andevaluate its potential in an experiment.

Files

55.pdf

(pdf | 0.331 Mb)

License info not available