Decision Trees vs. Ensembles in Regression-Based Offline RL

Interpretability–Performance Trade-offs and Return-to-Go Effects

Bachelor Thesis (2025)
Author(s)

R. Polenciuc (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Anna Lukina – Mentor (TU Delft - Algorithmics)

Daniël Vos – Graduation committee member (TU Delft - Algorithmics)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
25-06-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Offline reinforcement learning (RL) trains policies from pre-collected data, valuable in scenarios where real-world interaction is costly or risky. This paper systematically investigates the interpretability-performance trade-off of decision tree policies in a framework that reframes offline RL as supervised regression. Through extensive empirical evaluation of single and decomposed decision trees against an XGBoost ensemble on diverse D4RL environments, we show that compact trees, though inherently interpretable, suffer significant performance loss. Conversely, achieving competitive returns with larger trees sacrifices practical human auditability. Critically, return-to-go (RTG) conditioning introduces significant behavioral fragility; policies, despite structural transparency, exhibit unpredictable responses to RTG shifts, complicating their practical interpretability in dynamic environments. Our findings demonstrate that structural simplicity alone is insufficient for practical transparency in goal-conditioned RL, underscoring the need for further research in robustly interpretable sequential decision-making systems.

Files

License info not available