SEQUEL

Semi-Supervised Preference-based RL with Query Synthesis via Latent Interpolation

Conference Paper (2024)
Author(s)

Daniel Marta (KTH Royal Institute of Technology)

Simon Holk (KTH Royal Institute of Technology)

Christian Pek (TU Delft - Robot Dynamics)

Iolanda Leite (KTH Royal Institute of Technology)

Research Group
Robot Dynamics
DOI related publication
https://doi.org/10.1109/ICRA57147.2024.10610534
More Info
expand_more
Publication Year
2024
Language
English
Research Group
Robot Dynamics
Pages (from-to)
9585-9592
ISBN (electronic)
979-8-3503-8457-4
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Preference-based reinforcement learning (RL) poses as a recent research direction in robot learning, by allowing humans to teach robots through preferences on pairs of desired behaviours. Nonetheless, to obtain realistic robot policies, an arbitrarily large number of queries is required to be answered by humans. In this work, we approach the sample-efficiency challenge by presenting a technique which synthesizes queries, in a semi-supervised learning perspective. To achieve this, we leverage latent variational autoencoder (VAE) representations of trajectory segments (sequences of state-action pairs). Our approach manages to produce queries which are closely aligned with those labeled by humans, while avoiding excessive uncertainty according to the human preference predictions as determined by reward estimations. Additionally, by introducing variation without deviating from the original human's intents, more robust reward function representations are achieved. We compare our approach to recent state-of-the-art preference-based RL semi-supervised learning techniques. Our experimental findings reveal that we can enhance the generalization of the estimated reward function without requiring additional human intervention. Lastly, to confirm the practical applicability of our approach, we conduct experiments involving actual human users in a simulated social navigation setting. Videos of the experiments can be found at https://sites.google.com/view/rl-sequel

Files

SEQUEL_Semi-Supervised_Prefere... (pdf)
(pdf | 4.26 Mb)
- Embargo expired in 08-02-2025
License info not available