Deep end-to-end 3D person detection from Camera and Lidar

Conference Paper (2019)
Author(s)

Markus Roth (TU Delft - Mechanical Engineering, Daimler AG)

Dominik Jargot (Student TU Delft)

Dariu Gavrila (TU Delft - Mechanical Engineering)

Research Group
Intelligent Vehicles
DOI related publication
https://doi.org/10.1109/ITSC.2019.8917366 Final published version
More Info
expand_more
Publication Year
2019
Language
English
Related content
Research Group
Intelligent Vehicles
Pages (from-to)
521-527
ISBN (print)
978-1-5386-7024-8
Event
IEEE Intelligent Transportation Systems Conference (2019-10-27 - 2019-10-30), Auckland, New Zealand
Downloads counter
234
Collections
Institutional Repository
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

We present a method for 3D person detection from camera images and lidar point clouds in automotive scenes. The method comprises a deep neural network which estimates the 3D location and extent of persons present in the scene. 3D anchor proposals are refined in two stages: a region proposal network and a subsequent detection network.For both input modalities high-level feature representations are learned from raw sensor data instead of being manually designed. To that end, we use Voxel Feature Encoders [1] to obtain point cloud features instead of widely used projection-based point cloud representations, thus allowing the network to learn to predict the location and extent of persons in an end-to-end manner.Experiments on the validation set of the KITTI 3D object detection benchmark [2] show that the proposed method outperforms state-of-the-art methods with an average precision (AP) of 47.06% on moderate difficulty.

Files

License info not available