Extrinsic Camera Calibration using Human-pose Estimations and Automatic Re-identification

Master Thesis (2022)
Author(s)

W.J. Tempelaar (TU Delft - Mechanical Engineering)

Contributor(s)

Julian Kooij – Mentor (TU Delft - Intelligent Vehicles)

Marco Hennipman – Mentor (Siemens Mobility)

Faculty
Mechanical Engineering
Copyright
© 2022 Willem Jan Tempelaar
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Willem Jan Tempelaar
Graduation Date
29-08-2022
Awarding Institution
Delft University of Technology
Programme
Mechanical Engineering
Sponsors
Siemens Mobility
Faculty
Mechanical Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Multi-pedestrian tracking in camera networks has gained enormous interest in the industry because of its applicability in travel-flow analysis, autonomous driving, and surveillance. Essential to tracking in camera networks is camera calibration and, in particular extrinsic camera calibration. Extrinsic camera calibration incorporates the 3D position and orientation of the cameras in the camera network. Current methods for extrinsic calibration require special operators to place calibration objects in sight of all cameras in the network, which is impractical and limits the ease of tracking in camera networks.
In this thesis, an automatic extrinsic calibration model is proposed to calibrate the extrinsic camera parameters from the image data of all cameras in a network. The proposed method utilizes deep learning algorithms for feature extraction and matching. The feature extraction step is a human-pose estimator, extracting the key points of humans such as joints, eyes, and feet. The matching algorithm is a re-ID algorithm using an affinity-based feature extractor.
Two camera network datasets have been used to evaluate the proposed model. A four-camera fully overlapping dataset SALSA, and a more challenging seven-camera partially overlapping dataset WildTrack. The calibration accuracy of the model on both datasets is calculated by comparing the ground truth value with the calibrated extrinsic camera parameters. On dataset WildTrack, the model could calibrate three of the seven cameras, whereas the model had a near-perfect calibration on dataset SALSA. Dataset SALSA achieved a root mean squared error in translation of 0.02 meters and 0.0 radians in orientation, compared to the ground truth values.
The final product of this research is an automated extrinsic camera calibration model that eases the camera calibration in camera networks. The proposed model could not calibrate all datasets, but the model provides a baseline upon which future research can be done.

Files

License info not available