End-to-End Hierarchical Reinforcement Learning for Adaptive Flight Control

None, None

End-to-End Hierarchical Reinforcement Learning for Adaptive Flight Control

A method for model-independent control through Proximal Policy Optimization with learned Options

Master Thesis (2021)

Author(s)

Z.X. Ge (TU Delft - Aerospace Engineering)

Contributor(s)

EJ van Kampen – Mentor (TU Delft - Control & Simulation)

Guido Cornelis Henricus Eugene de Croon – Graduation committee member (TU Delft - Control & Simulation)

M. Mitici – Graduation committee member (TU Delft - Air Transport & Operations)

Faculty

Aerospace Engineering

Copyright

Reinforcement Learning Proximal Policy Optimization Hierarchical Reinforcement Learning Flight Control Systems Policy Gradient Option-Critic architecture

To reference this document use:

https://resolver.tudelft.nl/uuid:d3baec43-71d4-4f7f-ae27-2fdfdae7fea3

More Info

expand_more

Publication Year

2021

Language

English

Copyright

Graduation Date

27-08-2021

Awarding Institution

Delft University of Technology

Programme

['Aerospace Engineering']

Faculty

Aerospace Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Aircraft with disruptive designs have no high-fidelity and accurate flight models. At the same time, developing models for stochastic phenomena for traditional aircraft configurations are costly, and classical control methods cannot operate beyond the predefined operation points or adapt to unexpected changes to the aircraft. The Proximal Policy Option Critic (PPOC) is an end-to-end hierarchical reinforcement learning method that alleviates the need for a high-fidelity flight model and allows for adaptive flight control. This research contributes to the development and analysis of online adaptive flight control by comparing PPOC against a non-hierarchical method called Proximal Policy Optimization (PPO) and PPOC with a single Option (PPOC-1). The methods are tested on an extendable mass-spring-damper system and aircraft model. Subsequently, the agents are evaluated by their sample efficiency, reference tracking capability and adaptivity. The results show, unexpectedly, that PPO and PPOC-1 are more sample efficient than PPOC. Furthermore, both PPOC agents are able to successfully track the height profile, though the agents learn a policy that results in noisy actuator inputs. Finally, PPOC with multiple learned Options has the best adaptivity, as it is able to adapt to structural failure of the horizontal tailplane, sign change of pitch damping, and generalize to different aircraft.

Files

Master_thesis_Zhouxin_Ge_2021.... (pdf)

(pdf | 46.6 Mb)

License info not available