Rémi Tachet des Combes

Conference paper (1)

1 records found

Safe Policy Improvement with an Estimated Baseline Policy

Conference paper (2020) - Thiago D. Simão (author) , Romain Laroche (author) , Rémi Tachet des Combes (author)

Previous work has shown the unreliability of existing algorithms in the batch Reinforcement Learning setting, and proposed the theoretically-grounded Safe Policy Improvement with Baseline Bootstrapping (SPIBB) fix: reproduce the baseline policy in the uncertain state-action pairs ...