To the Max

None, None; None, None; None, None

To the Max

Reinventing Reward in Reinforcement Learning

Journal Article (2024)

Author(s)

G. Veviurko (TU Delft - Algorithmics)

Wendelin Böhmer (TU Delft - Sequential Decision Making)

M. Weerdt (TU Delft - Algorithmics)

Research Group

Algorithmics

To reference this document use:

https://resolver.tudelft.nl/uuid:22d0e22a-94d1-4e83-b337-016a1d284413

More Info

expand_more

Publication Year

2024

Language

English

Research Group

Algorithmics

Volume number

235

Pages (from-to)

49455-49470

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In reinforcement learning (RL), different reward functions can define the same optimal policy but result in drastically different learning performance. For some, the agent gets stuck with a suboptimal behavior, and for others, it solves the task efficiently. Choosing a good reward function is hence an extremely important yet challenging problem. In this paper, we explore an alternative approach for using rewards for learning. We introduce max-reward RL, where an agent optimizes the maximum rather than the cumulative reward. Unlike earlier works, our approach works for deterministic and stochastic environments and can be easily combined with state-of-the-art RL algorithms. In the experiments, we study the performance of max-reward RL algorithms in two goal-reaching environments from Gymnasium-Robotics and demonstrate its benefits over standard RL. The code is available at https://github.com/veviurko/To-the-Max.

Files

6273_To_the_Max_Reinventing_Re... (pdf)

(pdf | 0.755 Mb)

License info not available