Exploring Learned Abstract Models For Efficient Planning and Learning

None, None

doi:10.4233/uuid:73d7596e-7909-4186-85e6-845ae4cf2372

Exploring Learned Abstract Models For Efficient Planning and Learning

Doctoral Thesis (2025)

Author(s)

J. He (TU Delft - Sequential Decision Making)

Contributor(s)

F.A. Oliehoek – Promotor (TU Delft - Sequential Decision Making)

CM Jonker – Promotor (TU Delft - Interactive Intelligence)

Research Group

Sequential Decision Making

Reinforcement Learning Abstraction Online Planning

To reference this document use:

https://doi.org/10.4233/uuid:73d7596e-7909-4186-85e6-845ae4cf2372

More Info

expand_more

Publication Year

2025

Language

English

Research Group

Sequential Decision Making

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This thesis investigates the role of learned abstract models in online planning and model-based reinforcement learning (MBRL). We explore how abstract models can accelerate search in online planning and evaluate their effectiveness in supporting policy evaluation and improvement in MBRL.

In online planning, we focus on reducing the high computational cost of simulating large, factored, partially observable environments. In Chapter 3, we introduce the influence-augmented local simulator (IALS), which approximates external influences while preserving local agent interactions. By replacing the full simulator with IALS, we enable faster planning while maintaining decision quality. We propose a two-phase approach where the influence model is trained offline and later integrated into planning, allowing significantly more simulations within a fixed computational budget. However, this approach has limitations, including potential distribution shifts and the risk of poor generalization.

To address these issues, Chapter 4 introduces the self-improving simulator, which eliminates offline training by learning the abstract model online during planning. A simulator selection mechanism dynamically balances the use of the learned and original simulators, improving computational efficiency over time while ensuring planning accuracy. Our results show that this approach avoids distribution shift issues, prevents premature reliance on inaccurate models, and removes the delay associated with offline training.

In MBRL, we examine the effectiveness of MuZero’s learned model in supporting policy evaluation and improvement. In Chapter 5, we analyze how well MuZero’s model generalizes beyond its training distribution and find that it struggles to support planning "outside the box" due to accumulated model inaccuracies. However, we show that MuZero’s learned policy prior mitigates these errors by guiding the search toward regions where the model is more reliable. This insight highlights the dual role of the policy prior—not only improving search efficiency but also compensating for model imperfections, contributing to MuZero’s strong empirical performance.

Overall, this thesis advances the understanding of learned abstract models in sequential decision-making, demonstrating their potential to improve computational efficiency while identifying key limitations in their ability to support planning. We hope these findings encourage further research into abstraction-driven approaches for adaptive, scalable decision-making in complex environments.

Files

Jinke_He_final_dissertation_wi... (pdf)

(pdf | 27.9 Mb)

License info not available

Jinke_He_propositions.pdf

(pdf | 0.181 Mb)

License info not available