D.I. Popovici

info

Please Note

<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>

Bachelor thesis (1)

1 records found

The impact of model learning losses on the sample efficiency of MuZero in Atari

Bachelor thesis (2025) - D.I. Popovici, J. He, F.A. Oliehoek, M. Weinmann

Recent advances in reinforcement learning (RL) have achieved superhuman performance in various domains but often rely on vast numbers of environment interactions, limiting their practicality in real-world scenarios. MuZero is a RL algorithm that uses Monte Carlo Tree Search with a learned dynamics model, which is trained only to predict rewards, values, and policies, without any explicit objective to match real environment transitions. This work investigates how constraining the learned model of MuZero to follow the real environment dynamics with either a temporal-consistency loss over latent states or a pixel-level observation-reconstruction loss impacts the sample efficiency of MuZero, tested under the Atari100k benchmark. We evaluate performance on Pong, Breakout, and MsPacman analyzing the impact of each loss and its sensitivity to loss weight. Our results show how the temporal-consistency loss can improve performance in certain environments while the observation-reconstruction loss fails to do so, and that both losses are highly sensitive to their weight coefficient, indicating that they might require task-based fine tuning. ...