Extrinsic Camera Calibration using Human-pose Estimations and Automatic Re-identification

Master thesis (2022)

Authors

W.J. Tempelaar Mechanical Engineering

Contributors

J.F.P. Kooij Intelligent Vehicles - Mechanical, Maritime and Materials Engineering (supervisor 1)

Marco Hennipman (supervisor 1)

Faculty

Mechanical Engineering

Camera calibration Human-pose estimation Re-identification

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:726a6d3e-d656-4d9e-9c2f-ab22cbc7ca2a

Published Date

29-08-2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Mechanical Engineering

Abstract

Multi-pedestrian tracking in camera networks has gained enormous interest in the industry because of its applicability in travel-flow analysis, autonomous driving, and surveillance. Essential to tracking in camera networks is camera calibration and, in particular extrinsic camera calibration. Extrinsic camera calibration incorporates the 3D position and orientation of the cameras in the camera network. Current methods for extrinsic calibration require special operators to place calibration objects in sight of all cameras in the network, which is impractical and limits the ease of tracking in camera networks.
In this thesis, an automatic extrinsic calibration model is proposed to calibrate the extrinsic camera parameters from the image data of all cameras in a network. The proposed method utilizes deep learning algorithms for feature extraction and matching. The feature extraction step is a human-pose estimator, extracting the key points of humans such as joints, eyes, and feet. The matching algorithm is a re-ID algorithm using an affinity-based feature extractor.
Two camera network datasets have been used to evaluate the proposed model. A four-camera fully overlapping dataset SALSA, and a more challenging seven-camera partially overlapping dataset WildTrack. The calibration accuracy of the model on both datasets is calculated by comparing the ground truth value with the calibrated extrinsic camera parameters. On dataset WildTrack, the model could calibrate three of the seven cameras, whereas the model had a near-perfect calibration on dataset SALSA. Dataset SALSA achieved a root mean squared error in translation of 0.02 meters and 0.0 radians in orientation, compared to the ground truth values.
The final product of this research is an automated extrinsic camera calibration model that eases the camera calibration in camera networks. The proposed model could not calibrate all datasets, but the model provides a baseline upon which future research can be done.

Files

Thesis_Final_Willem_Jan_Tempel... (.pdf)

(.pdf | 11.7 Mb)