Discretising Continuous Action Spaces for Optimal Decision Trees

Verifiable Policies for Continuous Environments in Reinforcement Learning

Bachelor Thesis (2025)
Author(s)

M.M.J. van der Kuil (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

D.A. Vos – Mentor (TU Delft - Algorithmics)

A. Lukina – Graduation committee member (TU Delft - Algorithmics)

L. Cavalcante Siebert – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
24-06-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Complex reinforcement learning (RL) models that receive high rewards in their environments are often hard to understand. To this end, more interpretable models can be used, such as decision trees. To be able to deploy these models in safety-critical environments, they need to be high-performing and verifiable. Optimal decision trees can fulfill both these goal. While some methods already exist to find optimal decision trees, none have applied them to RL environments with continuous state
and action spaces [4; 13]. Broccoli is a state-of-the-art approach to synthesizing decision tree policies for blackbox environments [3]. Given a discretisation of the continuous state space, it can find the optimal tree in discrete action environments.
This research will focus on discretising action spaces of continuous environments. In doing so, discrete action spaces are created for which Broccoli can find optimal decision trees. Additionally, a goal is to show that it is possible to discretise continuous action spaces and compute corresponding optimal decision trees in feasible time. Furthermore, it proposes informed discretisation techniques that result in better-performing decision tree policies.

Files

License info not available