R. Ferede | TU Delft Repository

One Net to Rule Them All

Domain Randomization in Quadcopter Racing Across Different Platforms

Conference paper (2025) - Robin Ferede, Till Blaha, Erin Lucassen, Christophe De Wagter, Guido C.H.E. De Croon

In high-speed quadcopter racing, finding a single controller that works well across different platforms remains challenging. This work presents the first neural network controller for drone racing that generalizes across physically distinct quadcopters. We demonstrate that a single network, trained with domain randomization, can robustly control various types of quadcopters. The network relies solely on the current state to directly compute motor commands. The effectiveness of this generalized controller is validated through real-world tests on two substantially different crafts (3-inch and 5-inch race quadcopters). We further compare the performance of this generalized controller with controllers specifically trained for the 3-inch and 5-inch drone, using their identified model parameters with varying levels of domain randomization (0%, 10%, 20%, 30%). While the generalized controller shows slightly slower speeds compared to the fine-tuned models, it excels in adaptability across different platforms. Our results show that no randomization fails sim-to-real transfer while increasing randomization improves robustness but reduces speed. Despite this trade-off, our findings highlight the potential of domain randomization for generalizing controllers, paving the way for universal AI controllers that can adapt to any platform. ...

Optimality principles in spacecraft neural guidance and control

Review (2024) - Dario Izzo, Emmanuel Blazquez, Robin Ferede, Sebastien Origer, Christophe De Wagter, Guido C.H.E. de Croon

This Review discusses the main results obtained in training end-to-end neural architectures for guidance and control of interplanetary transfers, planetary landings, and close-proximity operations, highlighting the successful learning of optimality principles by the underlying neural models. Spacecraft and drones aimed at exploring our solar system are designed to operate in conditions where the smart use of onboard resources is vital to the success or failure of the mission. Sensorimotor actions are thus often derived from high-level, quantifiable, optimality principles assigned to each task, using consolidated tools in optimal control theory. The planned actions are derived on the ground and transferred on board, where controllers have the task of tracking the uploaded guidance profile. Here, we review recent trends based on the use of end-to-end networks, called guidance and control networks (G&CNets), which allow spacecraft to depart from such an architecture and to embrace the onboard computation of optimal actions. In this way, the sensor information is transformed in real time into optimal plans, thus increasing mission autonomy and robustness. We then analyze drone racing as an ideal gym environment to test these architectures on real robotic platforms and thus increase confidence in their use in future space exploration missions. Drone racing not only shares with spacecraft missions both limited onboard computational capabilities and similar control structures induced from the optimality principle sought but also entails different levels of uncertainties and unmodeled effects and a very different dynamical timescale. ...

End-to-end neural network based optimal quadcopter control

Journal article (2024) - Robin Ferede, Guido de Croon, Christophe De Wagter, Dario Izzo

Developing optimal controllers for aggressive high-speed quadcopter flight poses significant challenges in robotics. Recent trends in the field involve utilizing neural network controllers trained through supervised or reinforcement learning. However, the sim-to-real transfer introduces a reality gap, requiring the use of robust inner loop controllers during real flights, which limits the network's control authority and flight performance. In this paper, we investigate for the first time, an end-to-end neural network controller, addressing the reality gap issue without being restricted by an inner-loop controller. The networks, referred to as G&CNets, are trained to learn an energy-optimal policy mapping the quadcopter's state to rpm commands using an optimal trajectory dataset. In hover-to-hover flights, we identified the unmodeled moments as a significant contributor to the reality gap. To mitigate this, we propose an adaptive control strategy that works by learning from optimal trajectories of a system affected by constant external pitch, roll and yaw moments. In real test flights, this model mismatch is estimated onboard and fed to the network to obtain the optimal rpm command. We demonstrate the effectiveness of our method by performing energy-optimal hover-to-hover flights with and without moment feedback. Finally, we compare the adaptive controller to a state-of-the-art differential-flatness-based controller in a consecutive waypoint flight and demonstrate the advantages of our method in terms of energy optimality and robustness. ...

End-to-end Reinforcement Learning for Time-Optimal Quadcopter Flight

Conference paper (2024) - Robin Ferede, Christophe De Wagter, Dario Izzo, Guido C.H.E. De Croon

Aggressive time-optimal control of quadcopters poses a significant challenge in the field of robotics. The state-of-the-art approach leverages reinforcement learning (RL) to train optimal neural policies. However, a critical hurdle is the sim-to-real gap, often addressed by employing a robust inner loop controller-an abstraction that, in theory, constrains the optimality of the trained controller, necessitating margins to counter potential disturbances. In contrast, our novel approach introduces high-speed quadcopter control using end-to-end RL (E2E) that gives direct motor commands. To bridge the reality gap, we incorporate a learned residual model and an adaptive method that can compensate for modeling errors in thrust and moments. We compare our E2E approach against a state-of-the-art network that commands thrust and body rates to an INDI inner loop controller, both in simulated and real-world flight. E2E showcases a significant 1.39-second advantage in simulation and a 0.17-second edge in real-world testing, highlighting end-to-end reinforcement learning's potential. The performance drop observed from simulation to reality shows potential for further improvement, including refining strategies to address the reality gap or exploring offline reinforcement learning with real flight data. ...