Acting in the Face of Uncertainty

None, None

Acting in the Face of Uncertainty

Pessimism in Offline Model-Based Reinforcement Learning

Bachelor Thesis (2024)

Author(s)

S.K. van Wolfswinkel (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J. He – Mentor (TU Delft - Sequential Decision Making)

Frans Oliehoek – Graduation committee member (TU Delft - Sequential Decision Making)

Mathijs De Weerdt – Graduation committee member (TU Delft - Algorithmics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Reinforcement Learning Model-Based Pessimism Offline

To reference this document use:

https://resolver.tudelft.nl/uuid:2fa22ec9-88a4-4f71-bb04-ef606ec6b48a

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

27-06-2024

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Offline model-based reinforcement learning uses a model of the environment, learned from a static dataset of interactions, to guide policy generation. Sub-optimal planning decisions can be made when the agent explores states that are out-of-distribution, as the world model will have more uncertainty. This paper explores the use of pessimism, the tendency to avoid uncertain states, in the planning procedure. We evaluate Lower Confidence Bound, ensembles, and Monte Carlo dropout in the MinAtar breakout environment. Results indicate that ensemble methods yield the highest performance, with a significant performance gain over the baseline, while LCB also shows varying degrees of improvement. MC dropout is generally shown to not yield a performance improvement.

Files

Research_Project_Final.pdf

(pdf | 0.949 Mb)

License info not available