Reinforcement learning models are being utilised in a wide range of industries where even minor mistakes can have severe consequences. For safety reasons, it is important that a human expert can verify the decision-making process of a model. This is where interpretable reinforcem
...
Reinforcement learning models are being utilised in a wide range of industries where even minor mistakes can have severe consequences. For safety reasons, it is important that a human expert can verify the decision-making process of a model. This is where interpretable reinforcement learning proves its importance. This research is focused on training decision tree policies with a limited size and evaluating them on continuous action space environments. For that, a DAGGER algorithm is used with appropriate modifications to account for the continuous setting. The results demonstrate that small decision trees can replicate the high-performing neural network policies (e.g., TD3), achieving close to benchmark scores. Therefore, it is possible to explain the complex model's behaviour with much more understandable structures.