In recent years, flapping wing micro aerial vehicles (FWMAVs) have garnered significant attention due to their agility in cluttered environments and the safety advantages offered by their soft wings during close-proximity operations. These vehicles are subject to numerous constra
...
In recent years, flapping wing micro aerial vehicles (FWMAVs) have garnered significant attention due to their agility in cluttered environments and the safety advantages offered by their soft wings during close-proximity operations. These vehicles are subject to numerous constraints: limited power supply, restricted payload capacity, high drag and vibration from the flapping mechanism, and susceptibility to environmental disturbances such as wind gusts—making indoor operation preferable. However, these vehicles have not achieved autonomous navigation yet. The platform used in this study, the Flapper Nimble+, is attitude-stable, meaning it lacks access to egomotion or positional information and is only capable of controlling its thrust, pitch, roll, and yaw. Given the strict SWaP (Size, Power and Weight) constraints, lightweight Time-of-Flight (ToF) sensors are among the most practical sensing solutions. This thesis investigates the question: How can obstacle avoidance be achieved in an attitude-stable flapping wing air vehicle equipped with Time-of-Flight sensors? Two control approaches were implemented and tested in the IsaacGym simulator: a simple PID controller with confidence-based yaw adjustment, and a reinforcement learning (RL) policy trained using Proximal Policy Optimization. Two sensor configurations were considered: a minimal two-sensor setup (front and downward) and a richer five-sensor array for broader perception. The PID controller successfully avoided collisions in all environments, demonstrating high reliability but limited exploration, averaging 35% coverage. In contrast, the RL policy with five sensors achieved greater spatial exploration—averaging 49.5% coverage—at the cost of an average of 3.5 collisions per episode. When reduced to two sensors, the RL agent’s performance declined significantly, with average coverage dropping to 29% and collision counts increasing to 6.6 (excluding failed runs). These results show that autonomous navigation is achievable for attitude-stable flapping wing air vehicles using only ToF sensors. While PID control offers reliable, conservative navigation under minimal sensing, RL enables more exploratory behaviors and adaptability in dynamic environments.