Symbolic method for deriving policy in reinforcement learning

None, None; None, None; None, None

Symbolic method for deriving policy in reinforcement learning

Conference Paper (2016)

Author(s)

Eduard Alibekov (Czech Technical University)

Jiřì Kubalìk (Czech Technical University)

R Babuška (Czech Technical University, TU Delft - OLD Intelligent Control & Robotics)

Research Group

OLD Intelligent Control & Robotics

Copyright

DOI related publication

https://doi.org/10.1109/CDC.2016.7798684

Statistics Genetic programming Cybernetics Trajectory Standards Sociology Learning (artificial intelligence)

To reference this document use:

https://resolver.tudelft.nl/uuid:086f4ffd-09e9-4033-a6f3-5c7358705052

More Info

expand_more

Publication Year

2016

Language

English

Copyright

Research Group

OLD Intelligent Control & Robotics

Pages (from-to)

2789-2795

ISBN (print)

978-1-5090-1837-6

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper addresses the problem of deriving a policy from the value function in the context of reinforcement learning in continuous state and input spaces. We propose a novel method based on genetic programming to construct a symbolic function, which serves as a proxy to the value function and from which a continuous policy is derived. The symbolic proxy function is constructed such that it maximizes the number of correct choices of the control input for a set of selected states. Maximization methods can then be used to derive a control policy that performs better than the policy derived from the original approximate value function. The method was experimentally evaluated on two control problems with continuous spaces, pendulum swing-up and magnetic manipulation, and compared to a standard policy derivation method using the value function approximation. The results show that the proposed method and its variants outperform the standard method.

Files

Symbolic_Method_for_Deriving_P... (pdf)

(pdf | 0.85 Mb)

License info not available