Learning Depth from Single Monocular Images Using Stereo Supervisory Input

More Info
expand_more

Abstract

Stereo vision systems are often employed in robotics as a means for obstacle avoidance and navigation. These systems have inherent depth-sensing limitations, with significant problems in occluded and untextured regions, leading to sparse depth maps. We propose using a monocular depth estimation algorithm to tackle these problems, in a Self-Supervised Learning (SSL) framework. The algorithm learns online from the sparse depth map generated by a stereo vision system, producing a dense depth map. The algorithm is designed to be computationally efficient, for implementation onboard resource-constrained mobile robots and unmanned aerial vehicles. Within that context, it can be used to provide both reliability against a stereo camera failure, as well as more accurate depth perception, by filling in missing depth information, in occluded and low texture regions. This in turn allows the use of more efficient sparse stereo vision algorithms. We test the algorithm offline on a new, high resolution, stereo dataset, of scenes shot in indoor environments, and processed using both sparse and dense stereo matching algorithms. It is shown that the algorithm’s performance doesn’t deteriorate, and in fact sometimes improves, when learning only from sparse, high confidence regions rather than from the computationally expensive, dense, occlusion-filled and highly post-processed dense depth maps. This makes the approach very promising for self- supervised learning on autonomous robots.