J.C. van Dijk | TU Delft Repository

Visual Navigation for Tiny Drones

Doctoral thesis (2024) - Tom van Dijk, G.C.H.E. de Croon, C. de Wagter

In recent years, the use of drones in practical applications has seen a rapid increase, for instance in inspection, agriculture or environmental research. Most of these drones have a span in the order of tens of centimeters and a weight of half a kilogram or more. Smaller drones offer advantages in terms of safety and cost. However, their reduced payload capacity makes it difficult to carry the sensors and computers required for autonomous operation.

One of the most essential tasks an autonomous drone needs to perform is navigation. Here, navigation is defined as the ability to move towards a specified location while avoiding obstacles along the way. Ideally, the drone should also remember traveled routes, to make the return journey more efficient. However, on tiny drones (palm-size or smaller) the on-board processing power is often limited to a single microcontroller and the choice of sensors is limited. Cameras are popular sensors for tiny drones, because they're small, lightweight and passive, although they do require some processing power to produce useful results. The goal of this dissertation is to find a new, visual navigation strategy that fits within the constraints of these tiny drones.

First, existing work in terms of visual perception and avoidance is reviewed. Multiple options exist for visual perception: stereo vision, optical flow and monocular vision. All of these options are discussed and compared, leading to the conclusion that stereo vision performs best at shorter distances albeit at the cost of an additional camera, while monocular vision performs better at longer distances. Optical flow is ruled out for avoidance, as it has excessively large errors precisely in the direction of movement.
For avoidance, the options in terms of motion planning, map types and odometry are discussed. Perhaps unsurprisingly, the optimal choice is found to be dependent on the application. For computational efficiency on tiny drones, the most important choice is whether multiple measurements should be fused into a single map, or if individual percepts are good enough for avoidance. The latter is significantly less computationally demanding. For visual odometry, the depth information should be used if available, and the IMU can provide efficiency benefits in feature tracking. At Preliminary results are shown for monocular vision, visual odometry and obstacle avoidance.

Secondly, the dissertation takes a deeper dive into monocular depth estimation. Monocular depth estimation has the advantage that it only needs a single camera -- which saves valuable weight on tiny drones -- but its processing is more complex. The goal of this chapter is to analyze the learned behavior of neural networks for monocular depth perception, to see if this can be distilled into simple, lightweight algorithms. Using experiments based on data augmentation, it is shown that all four of the analyzed networks rely on the vertical position of objects in the image to estimate their depth. While this cue would be simple to replicate, it does depend on a known pose of the camera. Further investigation shows that the networks have a strong prior `assumption' about this pose, which may make transfer to drones more difficult. Finally, the networks need to have some sense of an `object'. In this case, it is shown that various shapes are recognized as an object provided that they have contrasting outlines and a dark shadow at the bottom. While this last feature is clearly present in the car-based KITTI dataset, it may not transfer directly to other environments. However, the vertical position cue can likely be used to provide monocular depth estimates to resource-limited systems such as tiny drones.

Thirdly, the remembering of traveled routes is investigated. Traditional mapping strategies from robotics would quickly run out of memory on microcontrollers, especially over longer trajectories. Instead, inspiration for a memory-efficient route-following strategy is found in nature. Here, insects are able to remember and follow remarkably long routes despite their tiny brains. Their strategy is often broken up into a few components, most notably path integration (odometry in robotics) and visual homing. We implement a novel strategy based on these components on a 56-gram drone. Here, the focus lies on traveling long distances using odometry, while periodically using visual homing to return to known locations to counteract odometric drift. The proposed strategy is demonstrated over multiple experiments, where the most efficient run required only 0.65 kilobytes to remember a route of 56 meters. This shows that tiny drones can retrace known paths by combining odometry with periodic homing maneuvers to counteract drift.

Finally, the avoidance of obstacles is discussed in the conclusion of this dissertation. This research has been performed by MSc students under my supervision, who have found and demonstrated that bug algorithms are an effective navigation strategy in three-dimensional, limited-field-of-view applications and provide a lightweight goal-oriented avoidance strategy that is suitable for tiny drones.

By combining all of the above results, a full navigation strategy for tiny drones can be proposed: tiny drones can visually navigate by using lightweight monocular vision algorithms to perceive obstacles, three-dimensional bug algorithms to avoid them while moving to new locations, and odometry and visual homing to retrace known paths. ...

In recent years, the use of drones in practical applications has seen a rapid increase, for instance in inspection, agriculture or environmental research. Most of these drones have a span in the order of tens of centimeters and a weight of half a kilogram or more. Smaller drones offer advantages in terms of safety and cost. However, their reduced payload capacity makes it difficult to carry the sensors and computers required for autonomous operation.

One of the most essential tasks an autonomous drone needs to perform is navigation. Here, navigation is defined as the ability to move towards a specified location while avoiding obstacles along the way. Ideally, the drone should also remember traveled routes, to make the return journey more efficient. However, on tiny drones (palm-size or smaller) the on-board processing power is often limited to a single microcontroller and the choice of sensors is limited. Cameras are popular sensors for tiny drones, because they're small, lightweight and passive, although they do require some processing power to produce useful results. The goal of this dissertation is to find a new, visual navigation strategy that fits within the constraints of these tiny drones.

First, existing work in terms of visual perception and avoidance is reviewed. Multiple options exist for visual perception: stereo vision, optical flow and monocular vision. All of these options are discussed and compared, leading to the conclusion that stereo vision performs best at shorter distances albeit at the cost of an additional camera, while monocular vision performs better at longer distances. Optical flow is ruled out for avoidance, as it has excessively large errors precisely in the direction of movement.
For avoidance, the options in terms of motion planning, map types and odometry are discussed. Perhaps unsurprisingly, the optimal choice is found to be dependent on the application. For computational efficiency on tiny drones, the most important choice is whether multiple measurements should be fused into a single map, or if individual percepts are good enough for avoidance. The latter is significantly less computationally demanding. For visual odometry, the depth information should be used if available, and the IMU can provide efficiency benefits in feature tracking. At Preliminary results are shown for monocular vision, visual odometry and obstacle avoidance.

Secondly, the dissertation takes a deeper dive into monocular depth estimation. Monocular depth estimation has the advantage that it only needs a single camera -- which saves valuable weight on tiny drones -- but its processing is more complex. The goal of this chapter is to analyze the learned behavior of neural networks for monocular depth perception, to see if this can be distilled into simple, lightweight algorithms. Using experiments based on data augmentation, it is shown that all four of the analyzed networks rely on the vertical position of objects in the image to estimate their depth. While this cue would be simple to replicate, it does depend on a known pose of the camera. Further investigation shows that the networks have a strong prior `assumption' about this pose, which may make transfer to drones more difficult. Finally, the networks need to have some sense of an `object'. In this case, it is shown that various shapes are recognized as an object provided that they have contrasting outlines and a dark shadow at the bottom. While this last feature is clearly present in the car-based KITTI dataset, it may not transfer directly to other environments. However, the vertical position cue can likely be used to provide monocular depth estimates to resource-limited systems such as tiny drones.

Thirdly, the remembering of traveled routes is investigated. Traditional mapping strategies from robotics would quickly run out of memory on microcontrollers, especially over longer trajectories. Instead, inspiration for a memory-efficient route-following strategy is found in nature. Here, insects are able to remember and follow remarkably long routes despite their tiny brains. Their strategy is often broken up into a few components, most notably path integration (odometry in robotics) and visual homing. We implement a novel strategy based on these components on a 56-gram drone. Here, the focus lies on traveling long distances using odometry, while periodically using visual homing to return to known locations to counteract odometric drift. The proposed strategy is demonstrated over multiple experiments, where the most efficient run required only 0.65 kilobytes to remember a route of 56 meters. This shows that tiny drones can retrace known paths by combining odometry with periodic homing maneuvers to counteract drift.

Finally, the avoidance of obstacles is discussed in the conclusion of this dissertation. This research has been performed by MSc students under my supervision, who have found and demonstrated that bug algorithms are an effective navigation strategy in three-dimensional, limited-field-of-view applications and provide a lightweight goal-oriented avoidance strategy that is suitable for tiny drones.

By combining all of the above results, a full navigation strategy for tiny drones can be proposed: tiny drones can visually navigate by using lightweight monocular vision algorithms to perceive obstacles, three-dimensional bug algorithms to avoid them while moving to new locations, and odometry and visual homing to retrace known paths.

Visual route following for tiny autonomous robots

Journal article (2024) - Tom van Dijk, Christophe De Wagter, Guido C.H.E. de Croon

Navigation is an essential capability for autonomous robots. In particular, visual navigation has been a major research topic in robotics because cameras are lightweight, power-efficient sensors that provide rich information on the environment. However, the main challenge of visual navigation is that it requires substantial computational power and memory for visual processing and storage of the results. As of yet, this has precluded its use on small, extremely resource-constrained robots such as lightweight drones. Inspired by the parsimony of natural intelligence, we propose an insect-inspired approach toward visual navigation that is specifically aimed at extremely resource-restricted robots. It is a route-following approach in which a robot's outbound trajectory is stored as a collection of highly compressed panoramic images together with their spatial relationships as measured with odometry. During the inbound journey, the robot uses a combination of odometry and visual homing to return to the stored locations, with visual homing preventing the buildup of odometric drift. A main advancement of the proposed strategy is that the number of stored compressed images is minimized by spacing them apart as far as the accuracy of odometry allows. To demonstrate the suitability for small systems, we implemented the strategy on a tiny 56-gram drone. The drone could successfully follow routes up to 100 meters with a trajectory representation that consumed less than 20 bytes per meter. The presented method forms a substantial step toward the autonomous visual navigation of tiny robots, facilitating their more widespread application. ...

Frustumbug: a 3D mapless stereo-vision-based bug algorithm for Micro Air Vehicles

Conference paper (2023) - R.S. Meester, Tom van Dijk, C. de Wagter, G.C.H.E. de Croon

Obstacle avoidance is an important capability for flying robots. But for robots with limited resources, such as small drones this becomes particularly challenging. Bug algorithms have been proposed to solve path planning with only minimal resources. And stereo vision provides a rich description of the world for limited weight but typically has a limited Field of View (FoV) and is fixed to the drone frame to further reduce weight. Based on these, a computationally light 3D path-planning algorithm is proposed. The proposed algorithm is called Frustumbug and is based on the Wedgebug algorithm since this algorithm copes well with a limited FoV. Since Wedgebug only addresses 2D problems, the Local-ϵ-Tangent-Graph (LETG) is used to extend the path planning to 3D. Disparity images are obtained through an optimized stereo block matching algorithm. Frustumbug copes well with noisy range sensor data and includes 3D trajectories like reversing, climbing and descending maneuvers to avoid or escape local minima. The algorithm has been tested with 225 flights in two challenging simulated environments and achieved a success rate of 96%. Here, 3.6% did not reach the goal and 0.4% collided. Frustumbug has been implemented on a 20-gram stereo vision system and was tested in the real world on a MAV. This shows the potential for small drones to reach their targets fully autonomously based on very limited resources. ...

Self-supervised learning for visual obstacle avoidance

Technical report

Book (2022) - Tom van Dijk

With a growing number of drones, the risk of collision with other air traffic or fixed obstacles increases. New safety measures are required to keep the operation of Unmanned Aerial Vehicles (UAVs) safe. One of these measures is the use of a Collision Avoidance System (CAS), a system that helps the drone autonomously detect and avoid obstacles. ...

Hear-and-avoid for unmanned air vehicles using convolutional neural networks

Journal article (2021) - D.C. Wijnker, Tom van Dijk, M. Snellen, G.C.H.E. de Croon, C. de Wagter

To investigate how an unmanned air vehicle can detect manned aircraft with a single microphone, an audio data set is created in which unmanned air vehicle ego-sound and recorded aircraft sound are mixed together. A convolutional neural network is used to perform air traffic detection. Due to restrictions on flying unmanned air vehicles close to aircraft, the data set has to be artificially produced, so the unmanned air vehicle sound is captured separately from the aircraft sound. They are then mixed with unmanned air vehicle recordings, during which labels are given indicating whether the mixed recording contains aircraft audio or not. The model is a convolutional neural network that uses the features Mel frequency cepstral coefficient, spectrogram or Mel spectrogram as input. For each feature, the effect of unmanned air vehicle/aircraft amplitude ratio, the type of labeling, the window length and the addition of third party aircraft sound database recordings are explored. The results show that the best performance is achieved using the Mel spectrogram feature. The performance increases when the unmanned air vehicle/aircraft amplitude ratio is decreased, when the time window is increased or when the data set is extended with aircraft audio recordings from a third party sound database. Although the currently presented approach has a number of false positives and false negatives that is still too high for real-world application, this study indicates multiple paths forward that can lead to an interesting performance. Finally, the data set is provided as open access ...

Self-Supervised Learning for Visual Obstacle Avoidance

Report (2020) - Tom van Dijk

With a growing number of drones, the risk of collision with other air traffic or fixed obstacles increases. New safety measures are required to keep the operation of Unmanned Aerial Vehicles (UAVs) safe. One of these measures is the use of a Collision Avoidance System (CAS), a system that helps the drone autonomously detect and avoid obstacles. The design of a Collision Avoidance System is a complex task with many smaller subproblems, as illustrated by Albaker and Rahim [1]. How should the drone sense nearby obstacles? When is there a risk of collision? What should the drone do when a conflict is detected? All of these questions need to be answered to develop a functional Collision Avoidance System. However, all of these subproblems – except the sensing of obstacles – only concern the behavior of the vehicle. They can be solved independently of the target platform as long as it can perform the required maneuvers; it does not matter whether it is a UAV or a larger vehicle. The sensing of the environment, on the other hand, is the only subproblem that places requirements on the hardware, specifically the sensors that should be carried by the UAV. It is the hardware that sets UAVs apart from other vehicles. Unlike autonomous cars, other groundbased vehicles or larger aircraft, UAVs have only a small payload capacity. It is therefore not practical to carry large or heavy sensors such as LIDAR or radar for obstacle avoidance. Instead, obstacle avoidance on UAVs requires clever use of lightweight sensors: cameras, microphones or antennae. This research will therefore focus on the sensing of the environment. Out of the sensors mentioned above – cameras, microphones and antennae – cameras are the only ones that can detect nearly all groundbased obstacles and other air traffic; microphones and antennae are limited to detection of sources of noise or radio signals1. Therefore, this research will focus on the visual detection of obstacles. The field of computer vision is welldeveloped; it may already be possible to find an adequate solution for visual obstacle detection using existing stereo vision methods like Semiglobal Matching (SGM) [23]. These methods, however, only use a fraction of the information present in the images to estimate depth – the disparity. Other cues such as the apparent size of known objects are completely ignored. The use of appearance cues for depth estimation is a relatively new development driven largely by the advent of Deep Learning, which allows these cues to be learned from large, labeled datasets. As long as the UAV’s operational environment is similar to this training dataset it should be possible to use appearance cues in a CAS. However, this is difficult to guarantee and may require a prohibitively large training set. SelfSupervised Learning may provide a solution to this problem. After training on an initial dataset, the UAV will continue to collect new training samples during operation. This allows it to ‘adapt’ to its operational environment and to learn new depth cues that are relevant in that environment. SelfSupervised Learning for depth map estimation is a young field, the first practical examples started to appear around 2016 (e.g. [17]). Most of the current literature is focused on automotive applications ...

With a growing number of drones, the risk of collision with other air traffic or fixed obstacles increases. New safety measures are required to keep the operation of Unmanned Aerial Vehicles (UAVs) safe. One of these measures is the use of a Collision Avoidance System (CAS), a system that helps the drone autonomously detect and avoid obstacles. The design of a Collision Avoidance System is a complex task with many smaller subproblems, as illustrated by Albaker and Rahim [1]. How should the drone sense nearby obstacles? When is there a risk of collision? What should the drone do when a conflict is detected? All of these questions need to be answered to develop a functional Collision Avoidance System. However, all of these subproblems – except the sensing of obstacles – only concern the behavior of the vehicle. They can be solved independently of the target platform as long as it can perform the required maneuvers; it does not matter whether it is a UAV or a larger vehicle. The sensing of the environment, on the other hand, is the only subproblem that places requirements on the hardware, specifically the sensors that should be carried by the UAV. It is the hardware that sets UAVs apart from other vehicles. Unlike autonomous cars, other groundbased vehicles or larger aircraft, UAVs have only a small payload capacity. It is therefore not practical to carry large or heavy sensors such as LIDAR or radar for obstacle avoidance. Instead, obstacle avoidance on UAVs requires clever use of lightweight sensors: cameras, microphones or antennae. This research will therefore focus on the sensing of the environment. Out of the sensors mentioned above – cameras, microphones and antennae – cameras are the only ones that can detect nearly all groundbased obstacles and other air traffic; microphones and antennae are limited to detection of sources of noise or radio signals1. Therefore, this research will focus on the visual detection of obstacles. The field of computer vision is welldeveloped; it may already be possible to find an adequate solution for visual obstacle detection using existing stereo vision methods like Semiglobal Matching (SGM) [23]. These methods, however, only use a fraction of the information present in the images to estimate depth – the disparity. Other cues such as the apparent size of known objects are completely ignored. The use of appearance cues for depth estimation is a relatively new development driven largely by the advent of Deep Learning, which allows these cues to be learned from large, labeled datasets. As long as the UAV’s operational environment is similar to this training dataset it should be possible to use appearance cues in a CAS. However, this is difficult to guarantee and may require a prohibitively large training set. SelfSupervised Learning may provide a solution to this problem. After training on an initial dataset, the UAV will continue to collect new training samples during operation. This allows it to ‘adapt’ to its operational environment and to learn new depth cues that are relevant in that environment. SelfSupervised Learning for depth map estimation is a young field, the first practical examples started to appear around 2016 (e.g. [17]). Most of the current literature is focused on automotive applications

A Tailless Flapping Wing MAV Performing Monocular Visual Servoing Tasks

Journal article (2020) - Diana A. Olejnik, Bardienus P. Duisterhof, Matej Karásek, Kirk Y.W. Scheper, Tom Van Dijk, Guido C.H.E. De Croon

In the field of robotics, a major challenge is achieving high levels of autonomy with small vehicles that have limited mass and power budgets. The main motivation for designing such small vehicles is that compared to their larger counterparts, they have the potential to be safer, and hence be available and work together in large numbers. One of the key components in micro robotics is efficient software design to optimally utilize the computing power available. This paper describes the computer vision and control algorithms used to achieve autonomous flight with the ∼30g tailless flapping wing robot, used to participate in the International Micro Air Vehicle Conference and Competition (IMAV 2018) indoor microair vehicle competition. Several tasks are discussed: line following, circular gate detection and fly through. The emphasis throughout this paper is on augmenting traditional techniques with the goal to make these methods work with limited computing power while obtaining robust behavior. ...

How Do Neural Networks See Depth in Single Images?

Conference paper (2019) - Tom van Dijk, Guido de Croon

Deep neural networks have lead to a breakthrough in depth estimation from single images. Recent work shows that the quality of these estimations is rapidly increasing. It is clear that neural networks can see depth in single images. However, to the best of our knowledge, no work currently exists that analyzes what these networks have learned. In this work we take four previously published networks and investigate what depth cues they exploit. We find that all networks ignore the apparent size of known obstacles in favor of their vertical position in the image. The use of the vertical position requires the camera pose to be known; however, we find that these networks only partially recognize changes in camera pitch and roll angles. Small changes in camera pitch are shown to disturb the estimated distance towards obstacles. The use of the vertical image position allows the networks to estimate depth towards arbitrary obstacles - even those not appearing in the training set - but may depend on features that are not universally present. ...

A Tailless Flapping Wing MAV Performing Monocular Visual Servoing Tasks

Conference paper (2019) - Diana Olejnik, Matej Karasek, Bart Duisterhof, Kirk Scheper, Tom van Dijk, Guido de Croon

In the field of robotics, a major challenge is achieving high levels of autonomy with small vehicles that have limited mass and power budgets. The main motivation for designing such small vehicles is that, compared to their larger counterparts, they have the potential to be safer, and hence be available and work together in large numbers. One of the key components in micro robotics is efficient software design to optimally utilize the computing power available. This paper describes the computer vision and control algorithms used to achieve autonomous flight with the _30-gram tailless flapping wing robot, used to participate in the IMAV 2018 indoor micro air vehicle competition. Several tasks are discussed: line following, and circular gate detection and fly-through. The emphasis throughout this paper is on augmenting traditional techniques with the goal to make these methods work with limited computing power while obtaining robust behaviour. ...

Hear-and-avoid for UAVs using convolutional neural networks

Conference paper (2019) - Dirk Wijnker, Tom van Dijk, M. Snellen, Guido de Croon, Christophe de Wagter

To investigate how an Unmanned Air Vehicle (UAV) can detect manned aircraft with a single microphone, an audio data set is created in which UAV ego-sound and recorded aircraft sound are mixed together. A convolutional neural network is used to perform the air traffic detection. Due to restrictions on flying UAVs close to aircraft, the data set has to be artificially produced, so the UAV sound is captured separately from the aircraft sound. They are then mixed with UAV recordings, during which labels are given indicating whether the mixed recording contains aircraft audio or not. The model is a CNN which uses the features MFCC, spectrogram or Mel spectrogram as input. For each feature the effect of UAV/aircraft amplitude ratio, the type of labeling, the window length and the addition of third party aircraft sound database recordings is explored. The results show that the best performance is achieved using the Mel spectrogram feature. The performance increases when the UAV/aircraft amplitude ratio is decreased, when the time window is increased or when the data set is extended with aircraft audio recordings from a third party sound database. Although the currently presented approach has a number of false positives and false negatives that is still too high for real-world application, this study indicates multiple paths forward that can lead to an interesting performance. In addition, the data set is provided as open access, allowing the community to contribute to the improvement of the detection task. ...