Multi-Task Offline Reinforcement Learning

None, None

Multi-Task Offline Reinforcement Learning

Experimental Evaluation of the Generalizability of the Soft Actor-Critic + Behavioral Cloning Algorithm

Bachelor Thesis (2024)

Author(s)

A.O. Geist (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M.T.J. Spaan – Mentor (TU Delft - Sequential Decision Making)

M.R. Weltevrede – Mentor (TU Delft - Sequential Decision Making)

E. Congeduti – Graduation committee member (TU Delft - Computer Science & Engineering-Teaching Team)

Faculty

Electrical Engineering, Mathematics and Computer Science

Reinforcement Learning (RL) Offline Reinforcement Learning Soft Actor-Critic (SAC) Multi-task learning Behavioural Cloning

To reference this document use:

https://resolver.tudelft.nl/uuid:7592e2ab-c728-4351-a1e4-57c2f7064f20

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

27-06-2024

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper examines the generalization capabilities of the Soft Actor-Critic (SAC) algorithm when combined with Behavioral Cloning (BC) in a MiniGrid Four-Room Environment. Reinforcement learning (RL), particularly offline, is important for tasks where interactions with the environments are risky or costly, and this research focuses on multi-task environments where generalizability to new tasks is crucial. Our findings indicate that SAC+BC can achieve generalization performance close to BC. Notably, while BC shows robustness across various dataset characteristics (quality, diversity, size), SAC alone struggles without integrating BC, highlighting the enhancement in generalization brought by this hybrid approach. Furthermore, an increased data size only enhances generalizability when introducing greater diversity. However, these results are constrained by hardware limitations, suggesting that further hyperparameter optimization and using more seeds could validate and possibly enhance our findings, demonstrating that SAC+BC is even more effective
than shown. The implementation details and the source code for this study are available on GitHub at https://github.com/AxelGeist/multi-task-offlinereinforcement-learning.

Files

Research_Paper.pdf

(pdf | 1.76 Mb)

License info not available