Imitation learning from neural networks with continuous action spaces using regression trees
T.S. Cichocki (TU Delft - Electrical Engineering, Mathematics and Computer Science)
D.A. Vos – Mentor (TU Delft - Algorithmics)
A. Lukina – Graduation committee member (TU Delft - Algorithmics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Reinforcement learning models are being utilised in a wide range of industries where even minor mistakes can have severe consequences. For safety reasons, it is important that a human expert can verify the decision-making process of a model. This is where interpretable reinforcement learning proves its importance. This research is focused on training decision tree policies with a limited size and evaluating them on continuous action space environments. For that, a DAGGER algorithm is used with appropriate modifications to account for the continuous setting. The results demonstrate that small decision trees can replicate the high-performing neural network policies (e.g., TD3), achieving close to benchmark scores. Therefore, it is possible to explain the complex model's behaviour with much more understandable structures.