Decision Trees vs. Ensembles in Regression-Based Offline RL

None, None

Decision Trees vs. Ensembles in Regression-Based Offline RL

Interpretability–Performance Trade-offs and Return-to-Go Effects

Bachelor Thesis (2025)

Author(s)

R. Polenciuc (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Anna Lukina – Mentor (TU Delft - Algorithmics)

Daniël Vos – Graduation committee member (TU Delft - Algorithmics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Decision Trees Interpretability Offline Reinforcement Learning Return-to-Go

To reference this document use:

https://resolver.tudelft.nl/uuid:0bed6eaa-f98c-4b0e-92ed-296c076d7500

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

25-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Offline reinforcement learning (RL) trains policies from pre-collected data, valuable in scenarios where real-world interaction is costly or risky. This paper systematically investigates the interpretability-performance trade-off of decision tree policies in a framework that reframes offline RL as supervised regression. Through extensive empirical evaluation of single and decomposed decision trees against an XGBoost ensemble on diverse D4RL environments, we show that compact trees, though inherently interpretable, suffer significant performance loss. Conversely, achieving competitive returns with larger trees sacrifices practical human auditability. Critically, return-to-go (RTG) conditioning introduces significant behavioral fragility; policies, despite structural transparency, exhibit unpredictable responses to RTG shifts, complicating their practical interpretability in dynamic environments. Our findings demonstrate that structural simplicity alone is insufficient for practical transparency in goal-conditioned RL, underscoring the need for further research in robustly interpretable sequential decision-making systems.

Files

Rares_polenciuc_final_thesis.p... (pdf)

(pdf | 1.97 Mb)

License info not available