Correct Me If I'm Wrong

None, None; None, None; None, None; None, None

Correct Me If I'm Wrong

Using Non-Experts to Repair Reinforcement Learning Policies

Conference Paper (2022)

Author(s)

Sanne Van Waveren (KTH Royal Institute of Technology)

Christian Pek (KTH Royal Institute of Technology)

Jana Tumova (KTH Royal Institute of Technology)

Iolanda Leite (KTH Royal Institute of Technology)

Affiliation

External organisation

DOI related publication

https://doi.org/10.1109/HRI53351.2022.9889604

Non-experts Policy repair Robot failure Shielded reinforcement learning

To reference this document use:

https://resolver.tudelft.nl/uuid:f8985114-b7d7-42fe-9cf0-b9e4077d1837

More Info

expand_more

Publication Year

2022

Language

English

Affiliation

External organisation

Pages (from-to)

493-501

ISBN (electronic)

9781538685549

Abstract

Reinforcement learning has shown great potential for learning sequential decision-making tasks. Yet, it is difficult to anticipate all possible real-world scenarios during training, causing robots to inevitably fail in the long run. Many of these failures are due to variations in the robot's environment. Usually experts are called to correct the robot's behavior; however, some of these failures do not necessarily require an expert to solve them. In this work, we query non-experts online for help and explore 1) if/how non-experts can provide feedback to the robot after a failure and 2) how the robot can use this feedback to avoid such failures in the future by generating shields that restrict or correct its high-level actions. We demonstrate our approach on common daily scenarios of a simulated kitchen robot. The results indicate that non-experts can indeed understand and repair robot failures. Our generated shields accelerate learning and improve data-efficiency during retraining.

No files available

Metadata only record. There are no files for this record.