Deep end-to-end 3D person detection from Camera and Lidar

Conference Paper (2019)
Author(s)

M. Roth (TU Delft - Intelligent Vehicles, Daimler AG)

Dominik Jargot (Student TU Delft)

Dariu Gavrila (TU Delft - Intelligent Vehicles)

Research Group
Intelligent Vehicles
Copyright
© 2019 M. Roth, Dominik Jargot, D. Gavrila
DOI related publication
https://doi.org/10.1109/ITSC.2019.8917366
More Info
expand_more
Publication Year
2019
Language
English
Copyright
© 2019 M. Roth, Dominik Jargot, D. Gavrila
Related content
Research Group
Intelligent Vehicles
Pages (from-to)
521-527
ISBN (print)
978-1-5386-7024-8
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

We present a method for 3D person detection from camera images and lidar point clouds in automotive scenes. The method comprises a deep neural network which estimates the 3D location and extent of persons present in the scene. 3D anchor proposals are refined in two stages: a region proposal network and a subsequent detection network.For both input modalities high-level feature representations are learned from raw sensor data instead of being manually designed. To that end, we use Voxel Feature Encoders [1] to obtain point cloud features instead of widely used projection-based point cloud representations, thus allowing the network to learn to predict the location and extent of persons in an end-to-end manner.Experiments on the validation set of the KITTI 3D object detection benchmark [2] show that the proposed method outperforms state-of-the-art methods with an average precision (AP) of 47.06% on moderate difficulty.

Files

License info not available