Eligibility traces and forgetting factor in recursive least-squares-based temporal difference

None, None; None, None; None, None

Eligibility traces and forgetting factor in recursive least-squares-based temporal difference

Journal Article (2022)

Author(s)

S Baldi (TU Delft - Team Bart De Schutter, Southeast University)

Z. Zhang (Southeast University)

Di Liu (Southeast University, University Medical Center Groningen)

Research Group

Team Bart De Schutter

Copyright

DOI related publication

https://doi.org/10.1002/acs.3282

Reinforcement learning Least squares Eligibility traces Instrumental variable method Temporal difference

To reference this document use:

https://resolver.tudelft.nl/uuid:a7ae1c3e-dd7c-4760-8c5f-95cddd17c975

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Research Group

Team Bart De Schutter

Issue number

2

Volume number

36

Pages (from-to)

334-353

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

We propose a new reinforcement learning method in the framework of Recursive Least Squares-Temporal Difference (RLS-TD). Instead of using the standard mechanism of eligibility traces (resulting in RLS-TD((Formula presented.))), we propose to use the forgetting factor commonly used in gradient-based or least-square estimation, and we show that it has a similar role as eligibility traces. An instrumental variable perspective is adopted to formulate the new algorithm, referred to as RLS-TD with forgetting factor (RLS-TD-f). An interesting aspect of the proposed algorithm is that it has an interpretation of a minimizer of an appropriate cost function. We test the effectiveness of the algorithm in a Policy Iteration setting, meaning that we aim to improve the performance of an initially stabilizing control policy (over large portion of the state space). We take a cart-pole benchmark and an adaptive cruise control benchmark as experimental platforms.

Files

Adaptive_Control_Signal_2021_B... (pdf)

(pdf | 0.991 Mb)

- Embargo expired in 01-07-2023

License info not available