Reinforcement Learning for Flight Control

Hybrid Offline-Online Learning for Robust and Adaptive Fault-Tolerance

More Info
expand_more

Abstract

Recent advancements in fault-tolerant flight control have involved model-free offline and online Reinforcement Learning algorithms in order to provide robust and adaptive control to autonomous systems. Inspired by recent work on Incremental Dual Heuristic Programming (IDHP) and Soft Actor-Critic (SAC), this research proposes a hybrid SAC-IDHP framework aiming to combine adaptive online learning of IDHP with the high complexity generalization power of SAC in a fully coupled system. Using principles from transfer learning, the architecture of the SAC-IDHP hybrid policy is designed as alternating pre-trained SAC layers and online learning identity initialized IDHP layers with the SAC layers frozen during online learning. This hybrid framework is implemented into the inner loop of a cascaded altitude controller for a high-fidelity, six-degree-of-freedom model of the Cessna Citation II PH-LAB research aircraft. Multiple altitude tracking tasks with coordinated turns are simulated to compare the tracking performance to SAC-only in several failure modes. Compared to SAC-only, the SAC-IDHP hybrid demonstrates an improvement in tracking performance of 0.74%, 5.46% and 0.82% in normalized Mean Absolute Error for nominal case, longitudinal and lateral failure cases respectively. Additionally, random online policy initialization is eliminated due to identity initialization of the hybrid policy, resulting in an argument for increased safety.