F.B. Flohr
Please Note
7 records found
1
Despite the success of deep learning, human pose estimation remains a challenging problem in particular in dense urban traffic scenarios. Its robustness is important for followup tasks like trajectory prediction and gesture recognition. We are interested in human pose estimation in crowded scenes with overlapping pedestrians, in particular pairwise constellations. We propose a new top-down method that relies on pairwise detections as input and jointly estimates the two poses of such pairs in a single forward pass within a deep convolutional neural network. As availability of automotive datasets providing poses and a fair amount of crowded scenes is limited, we extend the EuroCity Persons dataset by additional images and pose annotations. With 46, 975 images and poses of 279, 329 persons our new EuroCity Persons Dense Pose dataset is the largest pose dataset recorded from a moving vehicle. In our experiments using this dataset we show improved performance for poses of pedestrian pairs in comparison with a state of the art method for human pose estimation in crowds.
Anticipating future situations from streaming sensor data is a key perception challenge for mobile robotics and automated vehicles. We address the problem of predicting the path of objects with multiple dynamic modes. The dynamics of such targets can be described by a Switching Linear Dynamical System (SLDS). However, predictions from this probabilistic model cannot anticipate when a change in dynamic mode will occur. We propose to extract various types of cues with computer vision to provide context on the target’s behavior, and incorporate these in a Dynamic Bayesian Network (DBN). The DBN extends the SLDS by conditioning the mode transition probabilities on additional context states. We describe efficient online inference in this DBN for probabilistic path prediction, accounting for uncertainty in both measurements and target behavior. Our approach is illustrated on two scenarios in the Intelligent Vehicles domain concerning pedestrians and cyclists, so-called Vulnerable Road Users (VRUs). Here, context cues include the static environment of the VRU, its dynamic environment, and its observed actions. Experiments using stereo vision data from a moving vehicle demonstrate that the proposed approach results in more accurate path prediction than SLDS at the relevant short time horizon (1 s). It slightly outperforms a computationally more demanding state-of-the-art method.
EuroCity persons
A novel benchmark for person detection in traffic scenes
Big data has had a great share in the success of deep learning in computer vision. Recent works suggest that there is significant further potential to increase object detection performance by utilizing even bigger datasets. In this paper, we introduce the EuroCity Persons dataset, which provides a large number of highly diverse, accurate and detailed annotations of pedestrians, cyclists and other riders in urban traffic scenes. The images for this dataset were collected on-board a moving vehicle in 31 cities of 12 European countries. With over 238,200 person instances manually labeled in over 47,300 images, EuroCity Persons is nearly one order of magnitude larger than datasets used previously for person detection in traffic scenes. The dataset furthermore contains a large number of person orientation annotations (over 211,200). We optimize four state-of-the-art deep learning approaches (Faster R-CNN, R-FCN, SSD and YOLOv3) to serve as baselines for the new object detection benchmark. In experiments with previous datasets we analyze the generalization capabilities of these detectors when trained with the new dataset. We furthermore study the effect of the training set size, the dataset diversity (day- versus night-time, geographical region), the dataset detail (i.e., availability of object orientation information) and the annotation quality on the detector performance. Finally, we analyze error sources and discuss the road ahead.
We present a probabilistic framework for the joint estimation of pedestrian head and body orientation from a mobile stereo vision platform. For both head and body parts, we convert the responses of a set of orientation-specific detectors into a (continuous) probability density function. The parts are localized by means of a pictorial structure approach, which balances part-based detector responses with spatial constraints. Head and body orientations are estimated jointly to account for anatomical constraints. The joint single-frame orientation estimates are integrated over time by particle filtering. The experiments involved data from a vehicle-mounted stereo vision camera in a realistic traffic setting; 65 pedestrian tracks were supplied by a state-of-the-art pedestrian tracker. We show that the proposed joint probabilistic orientation estimation framework reduces the mean absolute head and body orientation error up to 15° compared with simpler methods. This results in a mean absolute head/body orientation error of about 21°/19°, which remains fairly constant up to a distance of 25 m. Our system currently runs in near real time (8-9 Hz).
We present an approach for the joint probabilistic estimation of pedestrian head and body orientation in the context of intelligent vehicles. For both, head and body, we convert the output of a set of orientation-specific detectors into a full (continuous) probability density function. The parts are localized with a pictorial structure approach which balances part-based detector output with spatial constraints. Head and body orientation estimates are furthermore coupled probabilistically to account for anatomical constraints. Finally, the coupled single-frame orientation estimates are integrated over time by particle filtering. The experiments involve 37 pedestrian tracks obtained from an external stereo vision-based pedestrian detector in realistic traffic settings. We show that the proposed joint probabilistic orientation estimation approach reduces the mean head and body orientation error by 10 degrees and more.
We present a novel Dynamic Bayesian Network for pedestrian path prediction in the intelligent vehicle domain. The model incorporates the pedestrian situational awareness, situation criticality and spatial layout of the environment as latent states on top of a Switching Linear Dynamical System (SLDS) to anticipate changes in the pedestrian dynamics. Using computer vision, situational awareness is assessed by the pedestrian head orientation, situation criticality by the distance between vehicle and pedestrian at the expected point of closest approach, and spatial layout by the distance of the pedestrian to the curbside. Our particular scenario is that of a crossing pedestrian, who might stop or continue walking at the curb. In experiments using stereo vision data obtained from a vehicle, we demonstrate that the proposed approach results in more accurate path prediction than only SLDS, at the relevant short time horizon (1 s), and slightly outperforms a computationally more demanding state-of-the-art method.