Use of sample-splitting and cross-fitting techniques to mitigate the risks of double-dipping in behaviour-agnostic reinforcement learning

Comparative Analysis

Bachelor Thesis (2024)
Author(s)

Y. Aslan (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

S.R. Bongers – Mentor (TU Delft - Sequential Decision Making)

FA Oliehoek – Mentor (TU Delft - Sequential Decision Making)

C.M. Jonker – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
27-06-2024
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper addresses the issue of double-dipping in off-policy evaluation (OPE) in behaviour-agnostic reinforcement learning, where the same dataset is used for both training and estimation, leading to overfitting and inflated performance metrics especially for variance. We introduce SplitDICE, which incorporates sample-splitting and cross-fitting techniques to mitigate double-dipping effects in the DICE family of estimators. Focusing specifically on 2-fold and 5-fold cross-fitting strategies, the original off-policy dataset is partitioned with random-split to get separate training and evaluation datasets. Experimental results demonstrate that SplitDICE, particularly with 5-fold cross-fitting, significantly reduces error, bias, and variance compared to naive DICE implementations, providing a more doubly-robust solution for behavior-agnostic OPE.

Files

Final_Research_Paper.pdf
(pdf | 0.731 Mb)
License info not available