Improving temporal interpolation of head and body pose using Gaussian process regression in a matrix completion setting

None, None; None, None; None, None

Improving temporal interpolation of head and body pose using Gaussian process regression in a matrix completion setting

Conference Paper (2018)

Author(s)

S. Tan (TU Delft - Pattern Recognition and Bioinformatics)

D.M.J. Tax (TU Delft - Pattern Recognition and Bioinformatics)

H.S. Hung (TU Delft - Pattern Recognition and Bioinformatics)

Research Group

Pattern Recognition and Bioinformatics

Copyright

DOI related publication

https://doi.org/10.1145/3279981.3279982

Matrix completion Head and Body pose estimation

To reference this document use:

https://resolver.tudelft.nl/uuid:556dfc45-251f-4cae-881e-be61d9a5af88

More Info

expand_more

Publication Year

2018

Language

English

Copyright

Research Group

Pattern Recognition and Bioinformatics

ISBN (electronic)

978-145036077-7

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper presents a model for head and body pose estimation (HBPE) when labelled samples are highly sparse. The current state-of-the-art multimodal approach to HBPE utilizes the matrix completion method in a transductive setting to predict pose labels for unobserved samples. Based on this approach, the proposed method tackles HBPE when manually annotated ground truth labels are temporally sparse. We posit that the current state of the art approach oversimplifies the temporal sparsity assumption by using Laplacian smoothing. Our final solution uses: i) Gaussian process regression in place of Laplacian smoothing, ii) head and body coupling, and iii) nuclear norm minimization in the matrix completion setting. The model is applied to the challenging SALSA dataset for benchmark against the state-of-the-art method. Our presented formulation outperforms the state-of-the-art significantly in this particular setting, e.g. at 5% ground truth labels as training data, head pose accuracy and body pose accuracy is approximately 62% and 70%, respectively. As well as fitting a more flexible model to missing labels in time, we posit that our approach also loosens the head and body coupling constraint, allowing for a more expressive model of the head and body pose typically seen during conversational interaction in groups. This provides a new baseline to improve upon for future integration of multimodal sensor data for the purpose of HBPE.

Files

3279981.3279982.pdf

(pdf | 8.53 Mb)

- Embargo expired in 08-04-2022

License info not available