Language-Guided Semantic Affordance Exploration for Efficient Reinforcement Learning

None, None

Language-Guided Semantic Affordance Exploration for Efficient Reinforcement Learning

Master Thesis (2024)

Author(s)

R. Ma (TU Delft - Mechanical Engineering)

Contributor(s)

J. Kober – Mentor (TU Delft - Learning & Autonomous Control)

J.D. Luijkx – Mentor (TU Delft - Learning & Autonomous Control)

Zlatan Ajanovic – Mentor (TU Delft - Learning & Autonomous Control)

Dimitris Boskos – Graduation committee member (TU Delft - Team Dimitris Boskos)

L. Peternel – Graduation committee member (TU Delft - Human-Robot Interaction)

Faculty

Mechanical Engineering

Reinforcement Learning Robot manipulation LLMs

To reference this document use:

https://resolver.tudelft.nl/uuid:1438cb53-65f0-4423-9678-3c96899c8f4f

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

27-09-2024

Awarding Institution

Delft University of Technology

Programme

['Mechanical Engineering | Robotics']

Faculty

Mechanical Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Reinforcement Learning (RL) shows great potential for robotic manipulation tasks, yet it suffers from low sample efficiency and needs extensive exploration of state-action spaces. Some recent methods leverage the commonsense knowledge and reasoning abilities of Large Language Models (LLMs) to guide RL exploration toward more meaningful states. However, LLMs may generate semantically correct but physically infeasible plans, leading to unreliable solutions. In this paper, we propose \textit{Language-Guided exploration for Reinforcement Learning} (LGRL), a novel framework that utilizes LLMs' planning capability to directly guide RL exploration. This approach utilizes LLM planning at both the task and affordance levels, enhancing learning efficiency by directing RL agents toward semantically meaningful actions. Unlike previous methods that rely on the optimality of LLM-generated plans or rewards, LGRL corrects sub-optimality and explores multimodal affordance-level plans without human intervention.
We evaluated LGRL on pick-and-place tasks within standard RL benchmark environments, demonstrating significant improvements in both sample efficiency and success rates.

Files

RunyuMa_Thesis_with_template.p... (pdf)

(pdf | 21.4 Mb)

License info not available