Towards Sample-Efficient Offline Reinforcement Learning in Flight Control

None, None

Towards Sample-Efficient Offline Reinforcement Learning in Flight Control

A Study on Sample-Efficient Model-Free Algorithms for Flight Control Tasks

Master Thesis (2025)

Author(s)

G.A. Alayón Blanco (TU Delft - Aerospace Engineering)

Contributor(s)

Erik-Jan van van Kampen – Mentor (TU Delft - Control & Simulation)

I.Z. El-Hajj – Mentor (TU Delft - Control & Simulation)

Coen C.de Visser – Graduation committee member (TU Delft - Control & Simulation)

A Sharpanskykh – Graduation committee member (TU Delft - Operations & Environment)

Faculty

Aerospace Engineering

Reinforcement Learning (RL) Fault Tolerance Intelligent Flight Control Sample Efficient Machine Learning Actor Critic Designs (ACD)

To reference this document use:

https://resolver.tudelft.nl/uuid:272ae912-ddb2-41ef-a273-ba513c7687fc

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

09-07-2025

Awarding Institution

Delft University of Technology

Programme

['Aerospace Engineering']

Faculty

Aerospace Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Sample efficiency is a critical metric in intelligent control
systems as it directly influences the feasibility and effectiveness of
learning-based approaches. This paper presents the study of how Randomized
Ensemble Double Q-Learning (REDQ), a sample-efficient model free algorithm, can
be used in flight control applications. Three controllers were developed for:
pitch, roll and combined biaxial attitude tracking tasks and tested on a high
fidelity Cessna Citation 550 model. For each control task, three agents were
trained offline: two using REDQ enhanced Soft Actor Critic (SAC) architectures
and one using a standard SAC architecture for comparison. REDQ agents showed
statistically significant improvements in sample efficiency during initial
learning. Average accurate tracking convergence (error < 1◦) occurred within
5,500 training steps for pitch, 6,400 for roll and 11,500 for biaxial control.
The gains in sample efficiency were shown to have drawbacks in learning
stability and robustness when deviated too far from nominal conditions.

Files

Thesis_Gabriel_Alayon_Blanco.p... (pdf)

(pdf | 8.08 Mb)

License info not available