Explainable Artificial Intelligence Techniques for the Analysis of Reinforcement Learning in Non-Linear Flight Regimes

Master thesis (2022)

Authors

Gabriel de Haro Pizarroso Aerospace Engineering

Contributors

C.C. de Visser (supervisor 2)

E. van Kampen (supervisor 1)

E. Mooij Astrodynamics & Space Missions - Aerospace Engineering (supervisor 2)

Faculty

Aerospace Engineering

Reinforcement Learning Machine Learning Deep Learning Explainable Artificial Intelligence Flight Control SHAP Adaptice Critic Designs

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:d278202d-fe0b-4679-8b74-63d3c2f57495

Published Date

29-06-2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Aerospace Engineering

Abstract

Reinforcement Learning is being increasingly applied to flight control tasks, with the objective of developing truly autonomous flying vehicles able to traverse highly variable environments and adapt to unknown situations or possible failures. However, the development of these increasingly complex models and algorithms further reduces our understanding of their inner workings. This can affect the safety and reliability of the algorithms, as it is difficult or even impossible to determine which are their failure characteristics and how they will react in situations never tested before. It is possible to remedy this lack of understanding through the development of eXplainable Artificial Intelligence and eXplainable Reinforcement Learning methods like SHapley Additive Explanations. In this thesis, this tool is used to analyze the strategy learnt by an Actor-Critic Incremental Dual Heuristic Programming controller architecture when presented with a series of pitch rate and roll rate tracking tasks in a variety of non-linear flying conditions, which include flying close to the stall regime, experiencing non-linear reference signals, reaching the control surface deflection limits, and flying with large sideslip angles. This same controller architecture has been previously explored with the same analysis tool but limited to the nominal linear flight regime, and it was observed that the controller learnt linear control laws, even though its Artificial Neural Networks should be able to approximate any function. This thesis shows that, even in the non-linear flight regime, it is still more optimal for this controller architecture to learn quasi-linear control laws, although it seems to continuously modify the linear slopes as if it was an extreme case of the gain scheduling technique. Additionally, a more complex Reinforcement Learning architecture based on the Soft Actor-Critic algorithm is also explored with the same analysis tool, to demonstrate its usability in the presence of non-linear control laws and to improve our understanding of this offline state-of-the-art algorithm.

Files

Thesis_GabrieldeHaro_5674069.p... (.pdf)

(.pdf | 20.7 Mb)