Towards Sample-Efficient Offline Reinforcement Learning in Flight Control

A Study on Sample-Efficient Model-Free Algorithms for Flight Control Tasks

Master Thesis (2025)
Author(s)

G.A. Alayón Blanco (TU Delft - Aerospace Engineering)

Contributor(s)

Erik-Jan van van Kampen – Mentor (TU Delft - Control & Simulation)

I.Z. El-Hajj – Mentor (TU Delft - Control & Simulation)

Coen C.de Visser – Graduation committee member (TU Delft - Control & Simulation)

A Sharpanskykh – Graduation committee member (TU Delft - Operations & Environment)

Faculty
Aerospace Engineering
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
09-07-2025
Awarding Institution
Delft University of Technology
Programme
['Aerospace Engineering']
Faculty
Aerospace Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Sample efficiency is a critical metric in intelligent control
systems as it directly influences the feasibility and effectiveness of
learning-based approaches. This paper presents the study of how Randomized
Ensemble Double Q-Learning (REDQ), a sample-efficient model free algorithm, can
be used in flight control applications. Three controllers were developed for:
pitch, roll and combined biaxial attitude tracking tasks and tested on a high
fidelity Cessna Citation 550 model. For each control task, three agents were
trained offline: two using REDQ enhanced Soft Actor Critic (SAC) architectures
and one using a standard SAC architecture for comparison. REDQ agents showed
statistically significant improvements in sample efficiency during initial
learning. Average accurate tracking convergence (error < 1◦) occurred within
5,500 training steps for pitch, 6,400 for roll and 11,500 for biaxial control.
The gains in sample efficiency were shown to have drawbacks in learning
stability and robustness when deviated too far from nominal conditions.



Files

License info not available