Explainable Artificial Intelligence Techniques for the Analysis of Reinforcement Learning in Non-Linear Flight Regimes

More Info
expand_more

Abstract

Reinforcement Learning is being increasingly applied to flight control tasks, with the objective of developing truly autonomous flying vehicles able to traverse highly variable environments and adapt to unknown situations or possible failures. However, the development of these increasingly complex models and algorithms further reduces our understanding of their inner workings. This can affect the safety and reliability of the algorithms, as it is difficult or even impossible to determine which are their failure characteristics and how they will react in situations never tested before. It is possible to remedy this lack of understanding through the development of eXplainable Artificial Intelligence and eXplainable Reinforcement Learning methods like SHapley Additive Explanations. In this thesis, this tool is used to analyze the strategy learnt by an Actor-Critic Incremental Dual Heuristic Programming controller architecture when presented with a series of pitch rate and roll rate tracking tasks in a variety of non-linear flying conditions, which include flying close to the stall regime, experiencing non-linear reference signals, reaching the control surface deflection limits, and flying with large sideslip angles. This same controller architecture has been previously explored with the same analysis tool but limited to the nominal linear flight regime, and it was observed that the controller learnt linear control laws, even though its Artificial Neural Networks should be able to approximate any function. This thesis shows that, even in the non-linear flight regime, it is still more optimal for this controller architecture to learn quasi-linear control laws, although it seems to continuously modify the linear slopes as if it was an extreme case of the gain scheduling technique. Additionally, a more complex Reinforcement Learning architecture based on the Soft Actor-Critic algorithm is also explored with the same analysis tool, to demonstrate its usability in the presence of non-linear control laws and to improve our understanding of this offline state-of-the-art algorithm.