3D Human Pose Estimation

Using a Top-View Depth Camera

Master Thesis (2020)
Author(s)

P.P. Mody (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

K Hildebrandt – Mentor (TU Delft - Computer Graphics and Visualisation)

Fei Zuo – Mentor (Philips Research)

Esther van der Heide – Mentor (Philips Research)

Hayley Hung – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

E. Eisemann – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2020 Prerak Mody
More Info
expand_more
Publication Year
2020
Language
English
Copyright
© 2020 Prerak Mody
Graduation Date
19-05-2020
Awarding Institution
Delft University of Technology
Programme
['Computer Science']
Sponsors
Philips Research
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The onset of delirium, a disturbance in the mental activities of a patient, can be potentially detected by understanding activities within an Intensive Care Unit (ICU) room. Such activities can be extracted by estimating human pose via a visual capture of the scene. This work uses a top-view depth camera in an ICU room to estimate pose of the non-patient stakeholders. The top-view leads to self-occlusions of body joints and thus poses a challenge for estimation of complete human pose. In addition, the presence of multiple persons in the room poses a secondary challenge, as detected body-joints need to be parsed into individual poses. To address these challenges, a 3D point cloud is extracted from the top-view depth image and passed through a 3D Convolutional Neural Network (CNN). This baseline method is capable of estimating both body-joints and body-parts to eventually output human pose for multiple persons. To improve the quality of output poses, the baseline method can benefit from additional spatial context since the problem of human pose estimation has a highly structured output. The proposed techniques either increase the receptive field, perform feature extraction at multiple scales or change the order of data processing. An increase in F1-score for the proposed methods highlights the importance of additional spatial context as a crucial tool to improve the performance of pose estimation models.

Files

PrerakMody_MSc_Thesis_3D_Human... (pdf)
(pdf | 17 Mb)
- Embargo expired in 19-05-2022
License info not available