Adaptive Optimal Control of Systems with Input Saturation
Introducing the Safe-GPI Algorithm
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
The main goal of this thesis is to develop and implement an algorithm that is capable of finding and applying the optimal policy for a class of continuous-time linear time-invariant systems which are subjected to input saturation. An algorithm is proposed that is capable of doing this on-line (i.e. in real time while controlling the system) without any prior knowledge of the systems dynamics. The proposed algorithm is named Safe-Generalized Policy Iteration ( Safe-GPI ) and uses Approximate Dynamic Programming techniques to converge towards the optimal control law. The main contribution of this work is a novel Safe Policy Improvement step that uses a Sum-of-Squares programming routine to guarantee that the updated policy will actually stabilize the system. Identification of the unknown system dynamics is achieved by employing on-line parameter estimation techniques. Subsequently a novel value function approximator can then be proven to converge through the certainty equivalence principle. For both the system identifier and the value function approximator experience replay update laws are presented that use current and recorded data, which allows the persistence of excitation condition to be checked on-line. The Safe-GPI algorithm is implemented in MATLAB and its convergence and performance properties are studied in two case studies. The first case study is on the position control of a mass-spring-damper cart system and here convergence of the Safe-GPI algorithm towards an optimal policy is demonstrated. Moreover the Safe-GPI algorithm is shown to outperform a current state-of-the-art algorithm. In the second case study the algorithm is implemented as a router-based Internet congestion controller, where it performs Active Queue Management (AQM).