Reinforcement Learning for Flight Control

Hybrid Offline-Online Learning for Robust and Adaptive Fault-Tolerance

Master thesis (2022)

Authors

C. Teirlinck Aerospace Engineering

Contributors

E. van Kampen (mentor)

Faculty

Aerospace Engineering, Aerospace Engineering

To reference this document use:

http://resolver.tudelft.nl/uuid:dae2fdae-50a5-4941-a49f-41c25bea8a85

More Info

expand_more

Published Date

14-09-2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Aerospace Engineering

Abstract

Recent advancements in fault-tolerant flight control have involved model-free offline and online Reinforcement Learning algorithms in order to provide robust and adaptive control to autonomous systems. Inspired by recent work on Incremental Dual Heuristic Programming (IDHP) and Soft Actor-Critic (SAC), this research proposes a hybrid SAC-IDHP framework aiming to combine adaptive online learning of IDHP with the high complexity generalization power of SAC in a fully coupled system. Using principles from transfer learning, the architecture of the SAC-IDHP hybrid policy is designed as alternating pre-trained SAC layers and online learning identity initialized IDHP layers with the SAC layers frozen during online learning. This hybrid framework is implemented into the inner loop of a cascaded altitude controller for a high-fidelity, six-degree-of-freedom model of the Cessna Citation II PH-LAB research aircraft. Multiple altitude tracking tasks with coordinated turns are simulated to compare the tracking performance to SAC-only in several failure modes. Compared to SAC-only, the SAC-IDHP hybrid demonstrates an improvement in tracking performance of 0.74%, 5.46% and 0.82% in normalized Mean Absolute Error for nominal case, longitudinal and lateral failure cases respectively. Additionally, random online policy initialization is eliminated due to identity initialization of the hybrid policy, resulting in an argument for increased safety.

Files

MSc_Thesis_Report_FINAL.pdf

(.pdf | 22.7 Mb)