Generalisation Ability of Proper Value Equivalence Models in Model-Based Reinforcement Learning

None, None

Generalisation Ability of Proper Value Equivalence Models in Model-Based Reinforcement Learning

Bachelor Thesis (2024)

Author(s)

S. Bratus (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J. He – Mentor (TU Delft - Sequential Decision Making)

Mathijs De Weerdt – Coach (TU Delft - Algorithmics)

Frans Oliehoek – Graduation committee member (TU Delft - Sequential Decision Making)

Faculty

Electrical Engineering, Mathematics and Computer Science

Generalisation Model-Based Reinforcement Learning Value Equivalence

To reference this document use:

https://resolver.tudelft.nl/uuid:49da7493-2cff-4fd0-85f7-0ee0f9219b26

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

25-06-2024

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

We investigate the generalization performance of predictive models in model-based reinforcement learning when trained using maximum likelihood estimation (MLE) versus proper value equivalence (PVE) loss functions. While the more conventional MLE loss aims to fit models to predict state transitions and rewards as accurately as possible, value-equivalent methods (e.g. PVE) prioritize value-relevant features. We show that in a tabular setting, MLE-based models generalize better than their PVE counterparts when fit to a small number of training policies, whereas PVE-based models perform better as the number of policies increases. With increasing model rank, generalisation error tends to improve for MLE and PVE, and the two become closer in generalisation ability.

Files

Bratus_CSE3000_Final.pdf

(pdf | 0.583 Mb)

License info not available