Multi-task Offline Reinforcement Learning with CQL

None, None

Multi-task Offline Reinforcement Learning with CQL

A study on how dataset size and diversity increase generalization performance

Bachelor Thesis (2024)

Author(s)

L. Lipinskas (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M.T.J. Spaan – Mentor (TU Delft - Sequential Decision Making)

M.R. Weltevrede – Mentor (TU Delft - Sequential Decision Making)

E. Congeduti – Graduation committee member (TU Delft - Computer Science & Engineering-Teaching Team)

Faculty

Electrical Engineering, Mathematics and Computer Science

Machine learning algorithms Offline Reinforcement Learning Multi-task learning

To reference this document use

https://resolver.tudelft.nl/uuid:65f9fa6b-e298-4a51-a4dc-f7135d4820da

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

23-06-2024

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

241

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Reinforcement learning (RL) is a type of machine learning where a model learns by making an observation of the current state it is in, picking out an action to execute, and observing the reward of said action, after which it receives the next state and repeats the cycle until it reaches its goal. The traditional online training approach allows the agent to directly interact with the live environment, but that is not always possible due to the live environment possibly being too dangerous or costly to train in. In cases like these offline training, which instead trains the agent on already pre-collected datasets of previously mentioned interactions and tries to learn a better policy than the one used for collection, offers a viable alternative by employing Q-Learning methods like CQL. However, prior studies, such as Mediratta et al., have suggested that Behavior Cloning (BC), a type of imitation cloning, may outperform modern offline RL methods in the multi-task setting, where model generalization is tested on new or similar tasks rather than the ones trained on. Considering these results, it begs the question whether it is worthwhile to employ modern Q-Learning methods designed to derive a better policy than the one used to collect the data, especially when they are unable to outperform standard imitation learning.
This study seeks to reproduce and extend these findings within a custom environment.
The results reveal that, contrary to the aforementioned report, BC does not consistently outperform CQL. Both machine learning methods exhibit comparable performance across datasets varying in diversity and size. Additionally, incorporating more diverse data significantly enhances generalization performance.

Files

Laimonas_Lipinskas_RP_FINAL.pd... (pdf)

(pdf | 1.65 Mb)

License info not available