Improving Monocular SLAM

using Depth Estimating CNN

More Info
expand_more

Abstract

To bring down the number of traffic accidents and increase people’s mobility companies, such as Robot Engineering Systems (RES) try to put automated vehicles on the road. RES is developing the WEpod, a shuttle capable of autonomously navigating through mixed traffic. This research has been done in cooperation with RES to improve the localization capabilities of the WEpod. The WEpod currently localizes using its GPS and lidar sensors. These have proven to be not accurate and reliable enough to safely navigate through traffic. Therefore, other methods of localization and mapping have been investigated. The primary method investigated in this research is monocular Simultaneous Localization and Mapping (SLAM). Based on literature and practical studies, ORB-SLAM has been chosen as the implementation of SLAM. Unfortunately, ORB-SLAM is unable to initialize the setup when applied on WEpod images. Literature has shown that this problem can be solved by adding depth information to the inputs of ORB-SLAM. Obtaining depth information for the WEpod images is not an arbitrary task. The sensors on the WEpod are not capable of creating the required dense depth-maps. A Convolutional Neural Network (CNN) could be used to create the depth-maps. This research investigates whether adding a depth-estimating CNN solves this initialization problem and increases the tracking accuracy of monocular ORB-SLAM. A well performing CNN is chosen and combined with ORB-SLAM. Images pass through the depth estimating CNN to obtain depth-maps. These depth-maps together with the original images are used in ORB-SLAM, keeping the whole setup monocular. ORB-SLAM with the CNN is first tested on the Kitti dataset. The Kitti dataset is used since monocular ORB- SLAM initializes on Kitti images and ground-truth depth-maps can be obtained for Kitti images. Monocular ORB-SLAM’s tracking accuracy has been compared to ORB-SLAM with ground-truth depth-maps and to ORB-SLAM with estimated depth-maps. This comparison shows that adding estimated depth-maps in- creases the tracking accuracy of ORB-SLAM, but not as much as the ground-truth depth images. The same setup is tested on WEpod images. The CNN is fine-tuned on 7481 Kitti images as well as on 642 WEpod images. The performance on WEpod images of both CNN versions are compared, and used in combination with ORB-SLAM. The CNN fine-tuned on the WEpod images does not perform well, missing details in the estimated depth-maps. However, this is enough to solve the initialization problem of ORB-SLAM. The combination of ORB-SLAM and the Kitti fine-tuned CNN has a better tracking accuracy than ORB-SLAM with the WEpod fine-tuned CNN. It has been shown that the initialization problem on WEpod images is solved as well as the tracking accuracy is increased. These results show that the initialization problem of monocular ORB-SLAM on WEpod images is solved by adding the CNN. This makes it applicable to improve the current localization methods on the WEpod. Using only this setup for localization on the WEpod is not possible yet, more research is necessary. Adding this setup to the current localization methods of the WEpod could increase the localization of the WEpod. This would make it safer for the WEpod to navigate through traffic. This research sets the next step into creating a fully autonomous vehicle which reduces traffic accidents and increases the mobility of people.