3D Human Pose Estimation

Using a Top-View Depth Camera

More Info
expand_more

Abstract

The onset of delirium, a disturbance in the mental activities of a patient, can be potentially detected by understanding activities within an Intensive Care Unit (ICU) room. Such activities can be extracted by estimating human pose via a visual capture of the scene. This work uses a top-view depth camera in an ICU room to estimate pose of the non-patient stakeholders. The top-view leads to self-occlusions of body joints and thus poses a challenge for estimation of complete human pose. In addition, the presence of multiple persons in the room poses a secondary challenge, as detected body-joints need to be parsed into individual poses. To address these challenges, a 3D point cloud is extracted from the top-view depth image and passed through a 3D Convolutional Neural Network (CNN). This baseline method is capable of estimating both body-joints and body-parts to eventually output human pose for multiple persons. To improve the quality of output poses, the baseline method can benefit from additional spatial context since the problem of human pose estimation has a highly structured output. The proposed techniques either increase the receptive field, perform feature extraction at multiple scales or change the order of data processing. An increase in F1-score for the proposed methods highlights the importance of additional spatial context as a crucial tool to improve the performance of pose estimation models.