Language Assistance in Reinforcement Learning in Dynamic Environments

None, None

Language Assistance in Reinforcement Learning in Dynamic Environments

Master Thesis (2023)

Author(s)

S.A.J. van Leeuwen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M.T.J. Spaan – Mentor (TU Delft - Algorithmics)

Wendelin Böhmer – Graduation committee member (TU Delft - Algorithmics)

J.A. de Vries – Mentor (TU Delft - Algorithmics)

Jie Yang – Coach (TU Delft - Web Information Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Meta-learning Reinforcement Learning Large Language Models (LLMs) Transfer learning GPT-4 Soft Actor-Critic (SAC) Language assistance Semantic embedding

To reference this document use:

https://resolver.tudelft.nl/uuid:e718cc72-c4e7-45f7-89c4-dd0bda7e79b0

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

18-12-2023

Awarding Institution

Delft University of Technology

Programme

['Computer Science']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Language is an intuitive and effective way for humans to communicate. Large Language Models (LLMs) can interpret and respond well to language. However, their use in deep reinforcement learning is limited as they are sample inefficient. State-of-the-art deep reinforcement learning algorithms are more sample efficient but cannot understand language well. This research aims to study whether RL agents can improve learning by utilizing language assistance and how LLMs can help them. A sentence describing the agent's environment is fed into an LLM to create a semantic embedding, which is consumed by a recurrent Soft Actor-Critic (SAC) agent to create an agent that can listen to natural language. This research shows that the best method for the agent to consume the embedding is concatenating it to each observation. Also, LLM-based embeddings lead to faster and more stable learning than non-LLM-based embeddings. The agent is sensitive to noise in the embedding but not to the embedding's dimensionality. The agent can generalize well across sentences that have a similar meaning to sentences seen during training but are formulated differently, but it can not generalize as well across sentences with unknown subjects and needs the subjects of the sentences to be grounded in training. Lastly, this research shows that the proposed architecture supports scaling language assistance to more complex environments.

Files

Master_Thesis_Sander_van_Leeuw... (pdf)

(pdf | 10.3 Mb)

License info not available