Trust-Region Twisted Policy Improvement

Journal Article (2025)
Author(s)

Joery A. de Vries (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Jinke He (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Yaniv Oren (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Matthijs T.J. Spaan (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group
Sequential Decision Making
More Info
expand_more
Publication Year
2025
Language
English
Research Group
Sequential Decision Making
Journal title
Proceedings of Machine Learning Research
Volume number
267
Pages (from-to)
12901-12923
Event
42nd International Conference on Machine Learning, ICML 2025 (2025-07-13 - 2025-07-19), Vancouver, Canada
Downloads counter
46
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Monte-Carlo tree search (MCTS) has driven many recent breakthroughs in deep reinforcement learning (RL). However, scalingMCTS to parallel compute has proven challenging in practice which has motivated alternative planners like sequential Monte-Carlo (SMC). Many of these SMC methods adopt particle filters for smoothing through a reformulation of RL as a policy inference problem. Yet, persisting design choices of these particle filters often conflict with the aim of online planning in RL, which is to obtain a policy improvement at the start of planning. Drawing inspiration from MCTS, we tailor SMC planners specifically to RL by improving data generation within the planner through constrained action sampling and explicit terminal state handling, as well as improving policy and value target estimation. This leads to our Trust-Region Twisted SMC (TRT-SMC), which shows improved runtime and sample-efficiency over baseline MCTS and SMC methods in both discrete and continuous domains.

Files

De-vries25a.pdf
(pdf | 1.06 Mb)
License info not available