SPLIT-PO: Sparse Piecewise-Linear Interpretable Tree Policy Optimization
An Interpretable and Differentiable Framework for Sparse-Tree Policy Optimization
E.M.L. Hellouin de Menibus (TU Delft - Electrical Engineering, Mathematics and Computer Science)
A. Lukina – Mentor (TU Delft - Algorithmics)
Daniël Vos – Mentor (TU Delft - Algorithmics)
L. Siebert – Graduation committee member (TU Delft - Interactive Intelligence)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Deep reinforcement learning has shown strong performance in continuous control tasks, but its reliance on deep neural networks (DNNs) hinders interpretability, limiting deployment in safety-critical domains. While recent approaches using differentiable decision trees improve transparency, they often rely on fixed structures that limit flexibility and lead to unnecessarily complex policies.
We propose SPLIT-PO (Sparse Piecewise-Linear Interpretable Tree Policy Optimization), a novel framework that learns sparse, interpretable decision trees with linear leaf controllers and dynamically adaptive structure. SPLIT-PO introduces learnable gating and regularization to prune uninformative branches during training, enabling compact tree policies to emerge automatically. It maintains end-to-end differentiability and integrates crispification within the training loop, building on prior interpretable methods like ICCT.
Experiments on standard continuous control benchmarks show that SPLIT-PO matches neural network performance (e.g., 285 vs. 287 average reward on Lunar Lander) while producing trees with 100–1000× fewer parameters and as few as 1–3 leaf nodes. Additionally, we prove SPLIT-PO is a universal function approximator, offering neural-level expressivity in an interpretable form. Although it requires more samples to converge, SPLIT-PO provides a promising foundation for transparent and verifiable reinforcement learning.