Explainable Reinforcement Learning in Flight Control through Reward Decomposition

Master Thesis (2022)
Author(s)

André Lemos (TU Delft - Aerospace Engineering)

Contributor(s)

E. van Kampen – Mentor (TU Delft - Control & Simulation)

C.C. de Visser – Graduation committee member (TU Delft - Control & Simulation)

M.C. Naeije – Graduation committee member (TU Delft - Astrodynamics & Space Missions)

Faculty
Aerospace Engineering
Copyright
© 2022 André Ferreira Lemos
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 André Ferreira Lemos
Graduation Date
23-09-2022
Awarding Institution
Delft University of Technology
Programme
['Aerospace Engineering | Control & Simulation']
Faculty
Aerospace Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Even though Deep Reinforcement Learning (DRL) techniques have proven their ability to solve highly complex control tasks, the opaqueness and inexplicability associated with these solutions many times stops them from being applied to real flight control applications. In this research, reward decomposition explanations are used to tackle this issue and augment DRL end-user explainability. A reward decomposition-based DRL controller is deployed in a longitudinal state-space model of the Cessna Citation 500 aircraft, and it is assessed on two attitude flight control tasks. Furthermore, a new explanation type called Dominant Reward eXplanations (DRX) is presented, which allows users to obtain more global insights than the ones generated by Reward Difference eXplanations (RDX). Results show that the explanations produced lead to straightforward and intuitive insights about the controller’s behaviour, capable of improving end-user explainability. Moreover, a small analysis seems to indicate that the decomposed method has similar performance to the one obtained without reward decomposition, however, training time increases considerably. To the author’s best knowledge, this is the first application of reward decomposition explanations to the flight control domain.

Files

MSc_Thesis_Andre_Lemos.pdf
(pdf | 7.89 Mb)
License info not available