RP
R. Polenciuc
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
1 records found
1
Decision Trees vs. Ensembles in Regression-Based Offline RL
Interpretability–Performance Trade-offs and Return-to-Go Effects
Offline reinforcement learning (RL) trains policies from pre-collected data, valuable in scenarios where real-world interaction is costly or risky. This paper systematically investigates the interpretability-performance trade-off of decision tree policies in a framework that reframes offline RL as supervised regression. Through extensive empirical evaluation of single and decomposed decision trees against an XGBoost ensemble on diverse D4RL environments, we show that compact trees, though inherently interpretable, suffer significant performance loss. Conversely, achieving competitive returns with larger trees sacrifices practical human auditability. Critically, return-to-go (RTG) conditioning introduces significant behavioral fragility; policies, despite structural transparency, exhibit unpredictable responses to RTG shifts, complicating their practical interpretability in dynamic environments. Our findings demonstrate that structural simplicity alone is insufficient for practical transparency in goal-conditioned RL, underscoring the need for further research in robustly interpretable sequential decision-making systems.
...
Offline reinforcement learning (RL) trains policies from pre-collected data, valuable in scenarios where real-world interaction is costly or risky. This paper systematically investigates the interpretability-performance trade-off of decision tree policies in a framework that reframes offline RL as supervised regression. Through extensive empirical evaluation of single and decomposed decision trees against an XGBoost ensemble on diverse D4RL environments, we show that compact trees, though inherently interpretable, suffer significant performance loss. Conversely, achieving competitive returns with larger trees sacrifices practical human auditability. Critically, return-to-go (RTG) conditioning introduces significant behavioral fragility; policies, despite structural transparency, exhibit unpredictable responses to RTG shifts, complicating their practical interpretability in dynamic environments. Our findings demonstrate that structural simplicity alone is insufficient for practical transparency in goal-conditioned RL, underscoring the need for further research in robustly interpretable sequential decision-making systems.