Eligibility traces and forgetting factor in recursive least-squares-based temporal difference

Journal Article (2022)
Author(s)

S Baldi (TU Delft - Team Bart De Schutter, Southeast University)

Z. Zhang (Southeast University)

Di Liu (Southeast University, University Medical Center Groningen)

Research Group
Team Bart De Schutter
Copyright
© 2022 S. Baldi, Z. Zhang, Di Liu
DOI related publication
https://doi.org/10.1002/acs.3282
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 S. Baldi, Z. Zhang, Di Liu
Research Group
Team Bart De Schutter
Issue number
2
Volume number
36
Pages (from-to)
334-353
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

We propose a new reinforcement learning method in the framework of Recursive Least Squares-Temporal Difference (RLS-TD). Instead of using the standard mechanism of eligibility traces (resulting in RLS-TD((Formula presented.))), we propose to use the forgetting factor commonly used in gradient-based or least-square estimation, and we show that it has a similar role as eligibility traces. An instrumental variable perspective is adopted to formulate the new algorithm, referred to as RLS-TD with forgetting factor (RLS-TD-f). An interesting aspect of the proposed algorithm is that it has an interpretation of a minimizer of an appropriate cost function. We test the effectiveness of the algorithm in a Policy Iteration setting, meaning that we aim to improve the performance of an initially stabilizing control policy (over large portion of the state space). We take a cart-pole benchmark and an adaptive cruise control benchmark as experimental platforms.

Files

Adaptive_Control_Signal_2021_B... (pdf)
(pdf | 0.991 Mb)
- Embargo expired in 01-07-2023
License info not available