Language-Guided Semantic Affordance Exploration for Efficient Reinforcement Learning

Master Thesis (2024)
Author(s)

R. Ma (TU Delft - Mechanical Engineering)

Contributor(s)

J. Kober – Mentor (TU Delft - Learning & Autonomous Control)

J.D. Luijkx – Mentor (TU Delft - Learning & Autonomous Control)

Zlatan Ajanovic – Mentor (TU Delft - Learning & Autonomous Control)

Dimitris Boskos – Graduation committee member (TU Delft - Team Dimitris Boskos)

L. Peternel – Graduation committee member (TU Delft - Human-Robot Interaction)

Faculty
Mechanical Engineering
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
27-09-2024
Awarding Institution
Delft University of Technology
Programme
Mechanical Engineering | Robotics
Faculty
Mechanical Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Reinforcement Learning (RL) shows great potential for robotic manipulation tasks, yet it suffers from low sample efficiency and needs extensive exploration of state-action spaces. Some recent methods leverage the commonsense knowledge and reasoning abilities of Large Language Models (LLMs) to guide RL exploration toward more meaningful states. However, LLMs may generate semantically correct but physically infeasible plans, leading to unreliable solutions. In this paper, we propose \textit{Language-Guided exploration for Reinforcement Learning} (LGRL), a novel framework that utilizes LLMs' planning capability to directly guide RL exploration. This approach utilizes LLM planning at both the task and affordance levels, enhancing learning efficiency by directing RL agents toward semantically meaningful actions. Unlike previous methods that rely on the optimality of LLM-generated plans or rewards, LGRL corrects sub-optimality and explores multimodal affordance-level plans without human intervention.
We evaluated LGRL on pick-and-place tasks within standard RL benchmark environments, demonstrating significant improvements in both sample efficiency and success rates.

Files

License info not available