One Pose Fits All

A novel kinematic approach to 3D human pose estimation

More Info
expand_more

Abstract

3D human pose estimation is a widely researched computer vision task that could be applied in scenarios such as virtual reality and human-robot interaction. With the lack of depth information, 3D estimation from monocular images is an inherently ambiguous problem. On top of that, unrealistic human poses have been overlooked in the majority of papers since joint detection is the only focus. Our work consists of two parts, an end-to-end 2D-3D lifting pipeline and a novel kinematic human model integrated approach. We start with Pose Estimation using TRansformer (PETR), an approach that does not require temporal information and has the attention mechanism to model the inter-joint relationship from RGB images. In the approach with human model, we emphasize pose similarity rather than focusing on joint detection. We propose a new metric, called Mean Per Bone Vector Error (MPBVE), that evaluates poses regardless of a human body’s gender, weight, or age. We introduce Pose Estimation on Bone Rotation using Transformer (PEBRT), a novel approach that regresses rotation matrices for 16 human bones, assuming labeled 2D poses as input. Our human model encapsulates joint angle and bone length constraints. Existing methods treat these constraints as an additional loss term, which does not guarantee realistic final out- puts. Our method does not require temporal information or receptive fields to generate kinematically realistic human poses. We demonstrate that PEBRT is capable of delivering comparable results on Human3.6M to existing methods.