Complex reinforcement learning (RL) models that receive high rewards in their environments are often hard to understand. To this end, more interpretable models can be used, such as decision trees. To be able to deploy these models in safety-critical environments, they need to be
...
Complex reinforcement learning (RL) models that receive high rewards in their environments are often hard to understand. To this end, more interpretable models can be used, such as decision trees. To be able to deploy these models in safety-critical environments, they need to be high-performing and verifiable. Optimal decision trees can fulfill both these goal. While some methods already exist to find optimal decision trees, none have applied them to RL environments with continuous state
and action spaces [4; 13]. Broccoli is a state-of-the-art approach to synthesizing decision tree policies for blackbox environments [3]. Given a discretisation of the continuous state space, it can find the optimal tree in discrete action environments.
This research will focus on discretising action spaces of continuous environments. In doing so, discrete action spaces are created for which Broccoli can find optimal decision trees. Additionally, a goal is to show that it is possible to discretise continuous action spaces and compute corresponding optimal decision trees in feasible time. Furthermore, it proposes informed discretisation techniques that result in better-performing decision tree policies.