Conditional Normalizing Flows for Modeling Environment Stochasticity

Using a MuZero-based learned model

Bachelor Thesis (2025)
Author(s)

B.D. Damian (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

F.A. Oliehoek – Mentor (TU Delft - Sequential Decision Making)

J. He – Mentor (TU Delft - Sequential Decision Making)

M. Weinmann – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
24-06-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Planning agents have demonstrated superhuman performance in deterministic environments, such as chess and Go, by combining end-to-end reinforcement learning with powerful tree-based search algorithms. To extend such agents to stochastic or partially observable domains, Stochastic MuZero leveraged a framework that models environment uncertainty by splitting transitions into agent actions and learned stochastic outcomes. In this paper, we propose a novel architecture, FlowZero, which builds on this idea but replaces the discrete latent modeling of environment stochasticity with Conditional Normalizing Flows (CNF). This allows the model to learn a rich, continuous probability distribution over possible future states conditioned on the afterstate. The key advantage of this approach is its ability to perform exact log-likelihood evaluation, offering more precise density estimation than the evidence lower bound (ELBO) used in Stochastic MuZero. We aim to verify the proposed CNF’s capacity to overfit data and generalize to similar and larger data, and our novel agent FlowZero’s capacity to perform in a stochastic environment.

Files

Research_paper_finalest.pdf
(pdf | 1.78 Mb)
License info not available