Sample efficiency is a critical metric in intelligent control
systems as it directly influences the feasibility and effectiveness of
learning-based approaches. This paper presents the study of how Randomized
Ensemble Double Q-Learning (REDQ), a sample-efficient model free algorithm, can
be used in flight control applications. Three controllers were developed for:
pitch, roll and combined biaxial attitude tracking tasks and tested on a high
fidelity Cessna Citation 550 model. For each control task, three agents were
trained offline: two using REDQ enhanced Soft Actor Critic (SAC) architectures
and one using a standard SAC architecture for comparison. REDQ agents showed
statistically significant improvements in sample efficiency during initial
learning. Average accurate tracking convergence (error < 1◦) occurred within
5,500 training steps for pitch, 6,400 for roll and 11,500 for biaxial control.
The gains in sample efficiency were shown to have drawbacks in learning
stability and robustness when deviated too far from nominal conditions.