SPLIT-PO: Sparse Piecewise-Linear Interpretable Tree Policy Optimization

None, None

SPLIT-PO: Sparse Piecewise-Linear Interpretable Tree Policy Optimization

An Interpretable and Differentiable Framework for Sparse-Tree Policy Optimization

Bachelor Thesis (2025)

Author(s)

E.M.L. Hellouin de Menibus (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Lukina – Mentor (TU Delft - Algorithmics)

D.A. Vos – Mentor (TU Delft - Algorithmics)

L. Cavalcante Siebert – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty

Electrical Engineering, Mathematics and Computer Science

Decision Tree Interpretable Machine Learning Machine learning (ML) Reinforment Learning

To reference this document use:

https://resolver.tudelft.nl/uuid:11c406d5-8a19-47f5-bf33-73229824fc88

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

24-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Deep reinforcement learning has shown strong performance in continuous control tasks, but its reliance on deep neural networks (DNNs) hinders interpretability, limiting deployment in safety-critical domains. While recent approaches using differentiable decision trees improve transparency, they often rely on fixed structures that limit flexibility and lead to unnecessarily complex policies.

We propose SPLIT-PO (Sparse Piecewise-Linear Interpretable Tree Policy Optimization), a novel framework that learns sparse, interpretable decision trees with linear leaf controllers and dynamically adaptive structure. SPLIT-PO introduces learnable gating and regularization to prune uninformative branches during training, enabling compact tree policies to emerge automatically. It maintains end-to-end differentiability and integrates crispification within the training loop, building on prior interpretable methods like ICCT.

Experiments on standard continuous control benchmarks show that SPLIT-PO matches neural network performance (e.g., 285 vs. 287 average reward on Lunar Lander) while producing trees with 100–1000× fewer parameters and as few as 1–3 leaf nodes. Additionally, we prove SPLIT-PO is a universal function approximator, offering neural-level expressivity in an interpretable form. Although it requires more samples to converge, SPLIT-PO provides a promising foundation for transparent and verifiable reinforcement learning.

Files

RP_Final_Paper-38.pdf

(pdf | 3.3 Mb)

License info not available