Acting in the Face of Uncertainty

Pessimism in Offline Model-Based Reinforcement Learning

Bachelor Thesis (2024)
Author(s)

S.K. van Wolfswinkel (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J. He – Mentor (TU Delft - Sequential Decision Making)

Frans Oliehoek – Graduation committee member (TU Delft - Sequential Decision Making)

Mathijs De Weerdt – Graduation committee member (TU Delft - Algorithmics)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
27-06-2024
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Offline model-based reinforcement learning uses a model of the environment, learned from a static dataset of interactions, to guide policy generation. Sub-optimal planning decisions can be made when the agent explores states that are out-of-distribution, as the world model will have more uncertainty. This paper explores the use of pessimism, the tendency to avoid uncertain states, in the planning procedure. We evaluate Lower Confidence Bound, ensembles, and Monte Carlo dropout in the MinAtar breakout environment. Results indicate that ensemble methods yield the highest performance, with a significant performance gain over the baseline, while LCB also shows varying degrees of improvement. MC dropout is generally shown to not yield a performance improvement.

Files

Research_Project_Final.pdf
(pdf | 0.949 Mb)
License info not available