J. He | TU Delft Repository

Exploring Learned Abstract Models For Efficient Planning and Learning

Doctoral thesis (2025) - J. He (author) , F.A. Oliehoek (promotor) , CM Jonker (promotor)

This thesis investigates the role of learned abstract models in online planning and model-based reinforcement learning (MBRL). We explore how abstract models can accelerate search in online planning and evaluate their effectiveness in supporting policy evaluation and improvement ...

This thesis investigates the role of learned abstract models in online planning and model-based reinforcement learning (MBRL). We explore how abstract models can accelerate search in online planning and evaluate their effectiveness in supporting policy evaluation and improvement in MBRL.

In online planning, we focus on reducing the high computational cost of simulating large, factored, partially observable environments. In Chapter 3, we introduce the influence-augmented local simulator (IALS), which approximates external influences while preserving local agent interactions. By replacing the full simulator with IALS, we enable faster planning while maintaining decision quality. We propose a two-phase approach where the influence model is trained offline and later integrated into planning, allowing significantly more simulations within a fixed computational budget. However, this approach has limitations, including potential distribution shifts and the risk of poor generalization.

To address these issues, Chapter 4 introduces the self-improving simulator, which eliminates offline training by learning the abstract model online during planning. A simulator selection mechanism dynamically balances the use of the learned and original simulators, improving computational efficiency over time while ensuring planning accuracy. Our results show that this approach avoids distribution shift issues, prevents premature reliance on inaccurate models, and removes the delay associated with offline training.

In MBRL, we examine the effectiveness of MuZero’s learned model in supporting policy evaluation and improvement. In Chapter 5, we analyze how well MuZero’s model generalizes beyond its training distribution and find that it struggles to support planning "outside the box" due to accumulated model inaccuracies. However, we show that MuZero’s learned policy prior mitigates these errors by guiding the search toward regions where the model is more reliable. This insight highlights the dual role of the policy prior—not only improving search efficiency but also compensating for model imperfections, contributing to MuZero’s strong empirical performance.

Overall, this thesis advances the understanding of learned abstract models in sequential decision-making, demonstrating their potential to improve computational efficiency while identifying key limitations in their ability to support planning. We hope these findings encourage further research into abstraction-driven approaches for adaptive, scalable decision-making in complex environments.

What model does MuZero learn?

Conference paper (2024) - J. He (author) , Thomas M Moerland (author) , J.A. de Vries (author) , Frans A Oliehoek (author)

Model-based reinforcement learning (MBRL) has drawn considerable interest in recent years, given its promise to improve sample efficiency. Moreover, when using deep-learned models, it is possible to learn compact and generalizable models from data. In this work, we study MuZero, ...

Benchmarking Robustness and Generalization in Multi-Agent Systems

A Case Study on Neural MMO

Journal article (2023) - Yangkun Chen (author) , Chenghui Yu (author) , Hengman Zhu (author) , Shuai Liu (author) , Yibing Zhang (author) , Joseph Suarez (author) , Liang Zhao (author) , J. He (author) , Jiaxin Chen (author) , More authors (author)

We present the results of the second Neural MMO challenge, hosted at IJCAI 2022, which received 1600+ submissions. This competition targets robustness and generalization in multi-agent systems: participants train teams of agents to complete a multi-task objective against opponent ...

Influence-Augmented Local Simulators

A Scalable Solution for Fast Deep RL in Large Networked Systems

Conference paper (2022) - Miguel Suau de Castro (author) , J. He (author) , Matthijs Spaan (author) , F.A. Oliehoek (author)

Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which that data can be obtained. In this paper, we study how to build lightweight simul ...

Influence-aware memory architectures for deep reinforcement learning in POMDPs

Journal article (2022) - Miguel Suau de Castro (author) , J. He (author) , E. Congeduti (author) , R.A.N. Starre (author) , Aleksander Czechowski (author) , FA Oliehoek (author)

Due to its perceptual limitations, an agent may have too little information about the environment to act optimally. In such cases, it is important to keep track of the action-observation history to uncover hidden state information. Recent deep reinforcement learning methods use r ...

Speeding up Deep Reinforcement Learning through Influence-Augmented Local Simulators

Conference paper (2022) - Miguel Suau de Castro (author) , J. He (author) , Matthijs Spaan (author) , F.A. Oliehoek (author)

Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which that data can be obtained. In this paper, we study how to build lightweight simul ...

Online Planning in POMDPs with Self-Improving Simulators

Conference paper (2022) - J. He (author) , M. Suau de Castro (author) , Hendrik Baier (author) , Michael Kaisers (author) , Frans A. Oliehoek (author)

How can we plan efficiently in a large and complex environment when the time budget is limited? Given the original simulator of the environment, which may be computationally very demanding, we propose to learn online an approximate but much faster simulator that improves over tim ...

Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems

Conference paper (2022) - M. Suau de Castro (author) , J. He (author) , Mustafa Mert Çelikok (author) , Matthijs T.J. Spaan (author) , F.A. Oliehoek (author)

Due to its high sample complexity, simulation is, as of today, critical for the successful application of reinforcement learning. Many real-world problems, however, exhibit overly complex dynamics, which makes their full-scale simulation computationally slow. In this paper, we sh ...

Multitask Soft Option Learning

Conference paper (2020) - Maximilian Igl (author) , Andrew Gambardella (author) , J. He (author) , Nantas Nardelli (author) , N Siddharth (author) , J.W. Böhmer (author) , Shimon Whiteson (author)

We present Multitask Soft Option Learning (MSOL), a hierarchical multitask framework based on Planning as Inference. MSOL extends the concept of options, using separate variational posteriors for each task, regularized by a shared prior. This “soft” version of options avoids seve ...

Influence-Augmented Online Planning for Complex Environments

Journal article (2020) - J. He (author) , M. Suau (author) , Frans Oliehoek (author)

How can we plan efficiently in real time to control an agent in a complex environment that may involve many other agents? While existing sample-based planners have enjoyed empirical success in large POMDPs, their performance heavily relies on a fast simulator. However, real-world ...