Acting in the Face of Uncertainty
Pessimism in Offline Model-Based Reinforcement Learning
S.K. van Wolfswinkel (TU Delft - Electrical Engineering, Mathematics and Computer Science)
J. He – Mentor (TU Delft - Sequential Decision Making)
Frans Oliehoek – Graduation committee member (TU Delft - Sequential Decision Making)
Mathijs De Weerdt – Graduation committee member (TU Delft - Algorithmics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Offline model-based reinforcement learning uses a model of the environment, learned from a static dataset of interactions, to guide policy generation. Sub-optimal planning decisions can be made when the agent explores states that are out-of-distribution, as the world model will have more uncertainty. This paper explores the use of pessimism, the tendency to avoid uncertain states, in the planning procedure. We evaluate Lower Confidence Bound, ensembles, and Monte Carlo dropout in the MinAtar breakout environment. Results indicate that ensemble methods yield the highest performance, with a significant performance gain over the baseline, while LCB also shows varying degrees of improvement. MC dropout is generally shown to not yield a performance improvement.