TransZero: Parallel Tree Expansion in MuZero using Transformer Networks
E.L. Malmsten (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Wendelin Böhmer – Mentor (TU Delft - Sequential Decision Making)
Tom Viering – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Over the past decade, model-based reinforcement learning (MBRL) has become a leading approach for solving complex decision-making problems. A prominent algorithm in this domain is MuZero, which integrates Monte Carlo Tree Search (MCTS) with deep neural networks and a latent world model to predict future states and outcomes. Despite its effectiveness, MuZero is inherently limited by the sequential nature of its search-tree construction during planning. In this work, we address this limitation by introducing TransZero-Parallel, the first model capable of constructing MCTS without any sequential constraints. This method replaces MuZero’s recurrent dynamics model with a transformer-based network, enabling the computation of a sequence of latent future states in parallel. We combine this with the MVC evaluator, which allows the search tree to be built without depending on the inherently sequential visitation counts. Together with small modifications to the MCTS algorithm, this enables the parallel expansion of entire subtrees within the search tree. Experiments in MiniGrid and LunarLander environments demonstrate that this combined approach yields up to an eleven-fold reduction in wall clock time while maintaining sample efficiency. These results highlight the potential of TransZero-Parallel to improve planning performance and reduce training time in model-based RL—bringing the field closer to real-time, real-world applications. The code is available through GitHub.
Github footnote: https://github.com/emalmsten/TransZero