Application of Continuous Reinforcement Learning on Innovative Control Effector Aircraft

Online Actor-Critic-Based Adaptive Control for a Tailless Aircraft with Innovative Control Effectors

More Info
expand_more

Abstract

Higher levels of autonomy in aerospace systems is an urgent requirement, considering the increase in control task difficulties, and the need for adaptability of the complex systems. Reinforcement learning (RL) control is one of the promising approaches for adaptive control of air vehicles that are designed for automation. Conventional discrete reinforcement learning methods fail in providing satisfactory performance for flight control systems (FCSs), especially for a complex configuration of a tailless over-actuated aircraft. The lack of efficiency of the discrete controller in exploration for finding the optimal policy, the so-called problem of 'curse of dimensionality', results in an approach that is not suitable for online implementation. Also, the achieved discrete non-smooth control policy usually does not apply to the real world control surfaces. This paper studies the experiments with Heuristic Dynamic Programming (HDP), a method obtained from adaptive critic design (ACDs), as a continuous reinforcement learning approach. ACD methods can capture the nonlinearities in the complex dynamics of the aircraft while solving the control problem computationally efficient by using continuous states and action spaces. Such qualities make ACDs suitable for online FCS design for unstable systems like tailless aircraft. In this paper, the ACD-based controller is developed and implemented for the Innovative Control Effector (ICE) aircraft, a highly maneuverable aircraft with redundancy in its control effectors suite. The coupled control effectors configuration has strong interactions and, therefore, proposes a need for proper control allocation. The online simulation results show the accuracy of the designed continuous RL controller in the longitudinal control of the aircraft using different sets of control effectors. The proposed approach also shows significant improvements in the tracking performance and control policy smoothness (e.g., compared to discrete methods).