This research addresses the challenge of interpretability in Reinforcement Learning (RL) for environments with continuous action spaces by extending the Decision Tree Policy Optimization (DTPO) algorithm, which was originally developed for discrete action spaces.
Unlike deep
...
This research addresses the challenge of interpretability in Reinforcement Learning (RL) for environments with continuous action spaces by extending the Decision Tree Policy Optimization (DTPO) algorithm, which was originally developed for discrete action spaces.
Unlike deep RL methods such as Proximal Policy Optimization (PPO), which are effective but difficult to interpret, DTPO offers transparent rule-based policies. We propose a continuous-action variant of the DTPO algorithm, DTPO-c, which allows decision trees to output Gaussian distribution parameters while maintaining interpretability. Our experiments on the Pendulum-v1 environment show that DTPO-c can achieve performance comparable to Robust Policy Optimization (RPO), although it requires more computational effort. Additionally, we investigate the impact of discretizing continuous actions and find that increasing action resolution does not always lead to improved performance, likely due to limited model capacity. These results confirm the feasibility of interpretable RL in continuous environments, making it suitable for applications where understanding and trusting the behavior of the model is important.