Autonomous UAV Landing on Stochastic Maritime Targets
A reinforcement learning approach for maritime UAV applications
H.S. Hennecken (TU Delft - Aerospace Engineering)
M.J. Ribeiro – Mentor (TU Delft - Aerospace Engineering)
O. Pfeifle – Mentor (Royal Netherlands Aerospace Centre)
E. van Kampen – Graduation committee member (TU Delft - Aerospace Engineering)
J.S. Sun – Mentor (TU Delft - Technology, Policy and Management)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Reliable autonomous recovery of Unmanned Aerial Vehicles (UAVs) on moving maritime platforms remains a critical challenge, primarily due to complex, stochastic deck motion, particularly vertical heave, and unpredictable environmental disturbances. Existing Reinforcement Learning (RL) approaches often simplify this environment, limiting their real-world applicability. This thesis investigates the robustness trade-offs of RL-based guidance controllers under realistic, high-dynamicity maritime conditions. We benchmarked a classical Proportional-IntegralDerivative (PID) controller against two RL architectures trained using Soft Actor-Critic (SAC) in a high-fidelity PyBullet simulation: a Full RL 3D controller and a novel Hybrid RL 1D controller, which strategically applies RL only to the critical, stochastic vertical (heave) axis. The results demonstrate that the Hybrid RL 1D architecture (86.6% success rate) achieved superior overall robustness and efficiency. Notably, the RL controllers dramatically reduced average landing time (RL_1D: 3.31 s vs. Baseline: 11.51 s), though the classical PID baseline maintained higher horizontal precision (Err𝑋𝑌 of 0.17 ± 0.17 m ). The Hybrid RL 1D maintained a superior success rate up to 89% in high sea states (SS7) and exhibited greater resilience to sensor noise. However, a critical limitation was identified: both RL-based policies experienced a pronounced performance collapse under strong, untrained wind disturbances, a regime where the non-adaptive classical PID baseline proved unexpectedly stable. These findings confirm the benefits of hybrid control for maximizing robustness and highlight that the system’s ability to handle wind disturbance rejection remains a significant, unresolved shortcoming for current RL guidance systems.