Deep segmentation of the drivable path of a self-driving vehicle using external data

Influence of domain shift factors and depth information

Master thesis (2018)

Faculty

Civil Engineering & Geosciences

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:f3a713cc-f4f8-4e54-a8cb-136ce18ef849

Published Date

26-11-2018

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Civil Engineering & Geosciences

Abstract

Robot Care Systems (RCS) is involved in the development of the WEpod, an autonomous shuttle which can transfer up to six people. Based on a predefined map of the environment, the shuttle is able to navigate through mixed traffic its perception sensors such as camera, radar and lidar sensors. This study is acquired in collaboration with RCS and focuses on two parts: assessing the influence of different factors on the domain shift and assessing the importance of depth information in the transformation of scene understanding from image space to top view.

For the WEpod, or any self-driving vehicle to safely travel over the road and through traffic, it is important to understand road scenes that appear in our daily life. This scene understanding is the base for a successful and reliable future of autonomous vehicles. Deploying a Convolutional Neural Network (CNN) in order to execute the task of semantic segmentation is a typical approach to attain such understanding of the surroundings. However, when a CNN is trained on a certain source domain and then deployed on a different (target) domain, the network will often execute the task poorly. This is the result of differences between the source and target domain and is referred to as domain shift. Although it is a common problem, the factors that cause these differences are not yet fully explored. We filled this research gap with the investigation of ten different factors.
To explore these factors, a base network was generated by a two-step fine-tuning procedure on an existing convolutional neural network (SegNet) which is pretrained on the CityScapes dataset (dataset for semantic segmentation). Fine-tuning on part of the RobotCar dataset (road scenery dataset recorded in Oxford, UK) is followed by a second fine-tuning step. The latter is done on part of the KITTI dataset (road scenery dataset recorded throughout Germany). Experiments are conducted in order to obtain the influence of each factor on a successful domain adaptation (i.e. negligible domain shift). The influence of factors on the domain shift based on semantic segmentation is assessed by comparing the result of every factor to the result on the base network. Results consist of the F1-measure and Jaccard index for drivable path segmentation and occupancy segmentation although the emphasis lies on the drivable path segmentation.
Significant positive influence on the estimation of drivable path for the WEpod domain was obtained when the ground truth labels only consisted of two labels (i.e. drivable path and non-drivable path) instead of three classes. This performance gain is signed by an increase of 8 percent points for both the IoU and the F1 metric. Making all images intrinsically consistent, and thus removing all geometric differences between the camera sensors, resulted in a larger increase of performance metrics. Compared to the baseline, both the Jaccard index and F1 metric increased with 10 percent points. The training order is a main contributor for domain adaptation with an increase of the IoU metric of 18 percent points and 20 percent points for the F1 metric. This shows that the target domain (WEpod) is more closely related to RobotCar than to KITTI.

Although the investigation of different factors potentially can realise a better performance on the WEpod domain, understanding the environment in image space is not enough because path planning is utilised in top view. Hence, a transformation between image space and top view is needed to use the scene understanding based on semantic segmentation to the full extent. For this transformation, three setups are utilised with each a different form of depth information available: no depth information, ground truth depth information and estimated depth information. This approach gains insight into the relevance of depth information on the usability of scene understanding. The accuracy of top view transformations is measured in the form of four metrics: raw lateral error, scaled lateral error, overlap length and a count metric. The first metric is concerned with the raw difference between the estimated and ground truth trajectory. The second metric is a scaled form of the former. The overlap length measures to what extent the trajectory is estimated while the count metric takes into account if the estimated trajectory is present in top view.
From the experiments it is concluded that depth information plays an important role in the lateral sense of the trajectories while it is of less importance for the longitudinal length of the trajectory. It is also noticed that depth information does not necessarily needs to be dense. Moreover, the sparse but ground truth depth information leads to a better trajectory.

Files

Thesis_report_RobbertBormans.p... (.pdf)

(.pdf | 26.1 Mb)