Explainable Reinforcement Learning in Flight Control through Reward Decomposition

None, None

Explainable Reinforcement Learning in Flight Control through Reward Decomposition

Master Thesis (2022)

Author(s)

André Lemos (TU Delft - Aerospace Engineering)

Contributor(s)

E. van Kampen – Mentor (TU Delft - Control & Simulation)

C.C. de Visser – Graduation committee member (TU Delft - Control & Simulation)

M.C. Naeije – Graduation committee member (TU Delft - Astrodynamics & Space Missions)

Faculty

Aerospace Engineering

Copyright

Explainable AI Deep Reinforcement Learning Explainable Reinforcement Learning Flight Control Systems Cessna Citation Reward Decomposition

To reference this document use:

https://resolver.tudelft.nl/uuid:dd7325fa-c615-4df5-8ce3-e3d6ef9cb33c

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Graduation Date

23-09-2022

Awarding Institution

Delft University of Technology

Programme

['Aerospace Engineering | Control & Simulation']

Faculty

Aerospace Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Even though Deep Reinforcement Learning (DRL) techniques have proven their ability to solve highly complex control tasks, the opaqueness and inexplicability associated with these solutions many times stops them from being applied to real flight control applications. In this research, reward decomposition explanations are used to tackle this issue and augment DRL end-user explainability. A reward decomposition-based DRL controller is deployed in a longitudinal state-space model of the Cessna Citation 500 aircraft, and it is assessed on two attitude flight control tasks. Furthermore, a new explanation type called Dominant Reward eXplanations (DRX) is presented, which allows users to obtain more global insights than the ones generated by Reward Difference eXplanations (RDX). Results show that the explanations produced lead to straightforward and intuitive insights about the controller’s behaviour, capable of improving end-user explainability. Moreover, a small analysis seems to indicate that the decomposed method has similar performance to the one obtained without reward decomposition, however, training time increases considerably. To the author’s best knowledge, this is the first application of reward decomposition explanations to the flight control domain.

Files

MSc_Thesis_Andre_Lemos.pdf

(pdf | 7.89 Mb)

License info not available