EH

E.M.L. Hellouin de Menibus

info

Please Note

1 records found

An Interpretable and Differentiable Framework for Sparse-Tree Policy Optimization

Deep reinforcement learning has shown strong performance in continuous control tasks, but its reliance on deep neural networks (DNNs) hinders interpretability, limiting deployment in safety-critical domains. While recent approaches using differentiable decision trees improve transparency, they often rely on fixed structures that limit flexibility and lead to unnecessarily complex policies.

We propose SPLIT-PO (Sparse Piecewise-Linear Interpretable Tree Policy Optimization), a novel framework that learns sparse, interpretable decision trees with linear leaf controllers and dynamically adaptive structure. SPLIT-PO introduces learnable gating and regularization to prune uninformative branches during training, enabling compact tree policies to emerge automatically. It maintains end-to-end differentiability and integrates crispification within the training loop, building on prior interpretable methods like ICCT.

Experiments on standard continuous control benchmarks show that SPLIT-PO matches neural network performance (e.g., 285 vs. 287 average reward on Lunar Lander) while producing trees with 100–1000× fewer parameters and as few as 1–3 leaf nodes. Additionally, we prove SPLIT-PO is a universal function approximator, offering neural-level expressivity in an interpretable form. Although it requires more samples to converge, SPLIT-PO provides a promising foundation for transparent and verifiable reinforcement learning. ...