SEQUEL

None, None; None, None; None, None; None, None

SEQUEL

Semi-Supervised Preference-based RL with Query Synthesis via Latent Interpolation

Conference Paper (2024)

Author(s)

Daniel Marta (KTH Royal Institute of Technology)

Simon Holk (KTH Royal Institute of Technology)

Christian Pek (TU Delft - Robot Dynamics)

Iolanda Leite (KTH Royal Institute of Technology)

Research Group

Robot Dynamics

DOI related publication

https://doi.org/10.1109/ICRA57147.2024.10610534

To reference this document use:

https://resolver.tudelft.nl/uuid:3b93394c-32ac-4951-b092-4f20f92104ae

More Info

expand_more

Publication Year

2024

Language

English

Research Group

Robot Dynamics

Pages (from-to)

9585-9592

ISBN (electronic)

979-8-3503-8457-4

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Preference-based reinforcement learning (RL) poses as a recent research direction in robot learning, by allowing humans to teach robots through preferences on pairs of desired behaviours. Nonetheless, to obtain realistic robot policies, an arbitrarily large number of queries is required to be answered by humans. In this work, we approach the sample-efficiency challenge by presenting a technique which synthesizes queries, in a semi-supervised learning perspective. To achieve this, we leverage latent variational autoencoder (VAE) representations of trajectory segments (sequences of state-action pairs). Our approach manages to produce queries which are closely aligned with those labeled by humans, while avoiding excessive uncertainty according to the human preference predictions as determined by reward estimations. Additionally, by introducing variation without deviating from the original human's intents, more robust reward function representations are achieved. We compare our approach to recent state-of-the-art preference-based RL semi-supervised learning techniques. Our experimental findings reveal that we can enhance the generalization of the estimated reward function without requiring additional human intervention. Lastly, to confirm the practical applicability of our approach, we conduct experiments involving actual human users in a simulated social navigation setting. Videos of the experiments can be found at https://sites.google.com/view/rl-sequel

Files

SEQUEL_Semi-Supervised_Prefere... (pdf)

(pdf | 4.26 Mb)

- Embargo expired in 08-02-2025

License info not available