End-to-End Hierarchical Reinforcement Learning for Adaptive Flight Control

A method for model-independent control through Proximal Policy Optimization with learned Options

Master Thesis (2021)
Author(s)

Z.X. Ge (TU Delft - Aerospace Engineering)

Contributor(s)

E. van Kampen – Mentor (Control & Simulation)

G.C.H.E. de Croon – Graduation committee member (Control & Simulation)

M.A. Mitici – Graduation committee member (TU Delft - Aerospace Engineering)

Faculty
Aerospace Engineering
More Info
expand_more
Publication Year
2021
Language
English
Graduation Date
27-08-2021
Awarding Institution
Delft University of Technology
Programme
Aerospace Engineering
Faculty
Aerospace Engineering
Downloads counter
208
Collections
thesis
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Aircraft with disruptive designs have no high-fidelity and accurate flight models. At the same time, developing models for stochastic phenomena for traditional aircraft configurations are costly, and classical control methods cannot operate beyond the predefined operation points or adapt to unexpected changes to the aircraft. The Proximal Policy Option Critic (PPOC) is an end-to-end hierarchical reinforcement learning method that alleviates the need for a high-fidelity flight model and allows for adaptive flight control. This research contributes to the development and analysis of online adaptive flight control by comparing PPOC against a non-hierarchical method called Proximal Policy Optimization (PPO) and PPOC with a single Option (PPOC-1). The methods are tested on an extendable mass-spring-damper system and aircraft model. Subsequently, the agents are evaluated by their sample efficiency, reference tracking capability and adaptivity. The results show, unexpectedly, that PPO and PPOC-1 are more sample efficient than PPOC. Furthermore, both PPOC agents are able to successfully track the height profile, though the agents learn a policy that results in noisy actuator inputs. Finally, PPOC with multiple learned Options has the best adaptivity, as it is able to adapt to structural failure of the horizontal tailplane, sign change of pitch damping, and generalize to different aircraft.

Files

License info not available