Reinforcement Learning-Based Guidance and Control for Aerial-to-Aerial Pest Interception

None, None

Reinforcement Learning-Based Guidance and Control for Aerial-to-Aerial Pest Interception

Master Thesis (2025)

Author(s)

M.S. Broekers (TU Delft - Aerospace Engineering)

Contributor(s)

G.C.H.E. de Croon – Mentor (TU Delft - Control & Simulation)

R.W. Vos – Mentor (TU Delft - Control & Simulation)

M. Yedutenko – Mentor (TU Delft - Control & Simulation)

C. de Wagter – Graduation committee member (TU Delft - Control & Simulation)

A. Bombelli – Graduation committee member (TU Delft - Operations & Environment)

Faculty

Aerospace Engineering

Reinforcement Learning Domain Randomization Micro Air Vehicle Reality Gap Interception Guidance and Control Reward Shaping

To reference this document use:

https://resolver.tudelft.nl/uuid:28823cd3-e24b-401e-b328-a10fefb7af78

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

11-12-2025

Awarding Institution

Delft University of Technology

Programme

['Aerospace Engineering']

Faculty

Aerospace Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

PATS-X is a greenhouse pest suppression system that uses a depth camera with an autonomous micro air vehicle (MAV) to detect, track, and physically intercept flying insects. This study targets guidance and control for reliable aerial-to-aerial interception. Reinforcement learning (RL) is used to learn policies from insect flight recordings. We evaluate control policies at increasing levels of abstraction: direct motor commands, collective thrust and body rates (CTBR), and acceleration. In simulation, lower abstraction levels yield better interception performance; moving from acceleration to motor command reduces the median time to first interception by about 41%. A systematic variation of the observation space reveals that the most effective observations are body frame relative position and velocity, and short temporal histories add no benefit beyond noise filtering. Compared with a state-of-the-art classical benchmark, Fast Response Proportional Navigation (FRPN), the best motor level RL policy in simulation achieves a median first interception time of 0.85 [0.76--1.07]s with 99.1% interception rate, compared with FRPN at 1.90 [1.04--2.80]s and 95.6%. To address the reality gap, we compare how well the different control abstractions transfer to hardware. CTBR policies deploy on hardware with the least performance loss relative to simulation. Motor-level policies also transfer when trained with modest domain randomization (DR) plus an action-difference penalty that limits command jitter and thermal load. Acceleration-level policies did not transfer. In a PATS-X proof of concept, an RL controller deployed on the actual system reached a 95.6% interception rate of virtual moths versus 80.0% for the existing controller. Moreover, the RL controller shortened time-to-first-interception by 0.70s, indicating the potential of RL-based guidance for the PATS-X system.

Files

Merlijn_Broekers_Thesis_Submis... (pdf)

(pdf | 5.15 Mb)

License info not available