J. He | TU Delft Repository

Smarter Moves: Enhancing the Exploration Method of MuZero

Action Selection in Model-Based Reinforcement Learning

Bachelor thesis (2025) - F. Ruas Vaz (author) , J. He (mentor) , Frans A Oliehoek (graduation committee member)

MuZero is a state-of-the-art reinforcement learning algorithm developed by DeepMind. This artificial intelligence program achieves superhuman performance in complex domains, the most noteworthy being popular board games and Atari games. Reinforcement learning agents, like MuZero, ...

Conditional Normalizing Flows for Modeling Environment Stochasticity

Using a MuZero-based learned model

Bachelor thesis (2025) - B.D. Damian (author) , Frans Oliehoek (mentor) , Jinke He (mentor) , M. Weinmann (graduation committee member)

Planning agents have demonstrated superhuman performance in deterministic environments, such as chess and Go, by combining end-to-end reinforcement learning with powerful tree-based search algorithms. To extend such agents to stochastic or partially observable domains, Stochastic ...

The impact of model learning losses on the sample efficiency of MuZero in Atari

Bachelor thesis (2025) - D.I. Popovici (author) , J. He (mentor) , F.A. Oliehoek (mentor) , Michael Weinmann (graduation committee member)

Recent advances in reinforcement learning (RL) have achieved superhuman performance in various domains but often rely on vast numbers of environment interactions, limiting their practicality in real-world scenarios. MuZero is a RL algorithm that uses Monte Carlo Tree Search with ...

Exploring Attention Mechanisms in Transformers for Data-Efficient Model-Based Reinforcement Learning

Bachelor thesis (2025) - D. De Dios Allegue (author) , F.A. Oliehoek (mentor) , J. He (mentor) , Michael Weinmann (graduation committee member)

A key advancement in model-based Reinforcement Learning (RL) stems from Transformer
based world models, which allow agents to plan effectively by learning an internal represen
tation of the environment. However, causal self-attention in Transformers can be computa
tio ...

Action Sampling Strategies in Sampled MuZero for Continuous Control

A JAX-Based Implementation with Evaluation of Sampling Distributions and Progressive Widening

Bachelor thesis (2025) - V. Kuboň (author) , J. He (mentor) , F.A. Oliehoek (mentor) , Michael Weinmann (graduation committee member)

This work investigates the impact of action sampling strategies on the performance of Sampled MuZero, a reinforcement learning algorithm designed for continuous control settings like robotics. In contrast to discrete domains, continuous action spaces require sampling from a propo ...

Acting in the Face of Uncertainty

Pessimism in Offline Model-Based Reinforcement Learning

Bachelor thesis (2024) - S.K. van Wolfswinkel (author) , J. He (mentor) , Frans Oliehoek (graduation committee member) , Mathijs M. De Weerdt (graduation committee member)

Offline model-based reinforcement learning uses a model of the environment, learned from a static dataset of interactions, to guide policy generation. Sub-optimal planning decisions can be made when the agent explores states that are out-of-distribution, as the world model will h ...

Generalisation Ability of Proper Value Equivalence Models in Model-Based Reinforcement Learning

Bachelor thesis (2024) - S. Bratus (author) , J. He (mentor) , Mathijs M. De Weerdt (coach) , Frans Oliehoek (graduation committee member)

We investigate the generalization performance of predictive models in model-based reinforcement learning when trained using maximum likelihood estimation (MLE) versus proper value equivalence (PVE) loss functions. While the more conventional MLE loss aims to fit models to predict ...

Understanding the Effects of Discrete Representations in Model-Based Reinforcement Learning

An analysis on the effects of categorical latent space world models on the MinAtar Environment

Bachelor thesis (2024) - M. Mitrea (author) , Frans Oliehoek (mentor) , J. He (mentor) , Mathijs M. De Weerdt (graduation committee member)

While model-free reinforcement learning (MFRL) approaches have been shown effective at solving a diverse range of environments, recent developments in model-based reinforcement learning (MBRL) have shown that it is possible to leverage its increased sample efficiency and generali ...

See Clearly, Act Intelligently: Transformers in Transparent Environments

Bachelor thesis (2024) - O. Elamin (author) , J. He (mentor) , Frans Oliehoek (mentor) , Mathijs M. De Weerdt (graduation committee member)

Traditionally, Recurrent Neural Networks (RNNs) are used to predict the sequential dynamics of the environment. With the advancement and breakthroughs of Transformer models, there has been demonstrated improvement in the performance & sample efficiency of Transformers as worl ...

REAL Reinforcement Learning

Planning with adversarial models

Master thesis (2022) - D. Foffano (author) , Frans A Oliehoek (mentor) , J. He (mentor) , Jan Van Gemert (graduation committee member)

Model-Based Reinforcement Learning (MBRL) algorithms solve sequential decision-making problems, usually formalised as Markov Decision Processes, using a model of the environment dynamics to compute the optimal policy. When dealing with complex environments, the environment dynami ...

Q-value reuse between state abstractions for traffic light control

Bachelor thesis (2020) - E.F.M. Kuhn (author) , J. He (mentor) , R.A.N. Starre (mentor) , F.A. Oliehoek (graduation committee member)

Previous research has in reinforcement learning for traffic control has used various state abstractions. Some use feature vectors while others use matrices of car positions. This paper first compares a simple feature vector consisting of only queue sizes per incoming lane to a ma ...