Hand gestures classification in crowded environments
Classification of gesture phases in a crowded social setting recorded from top-view angle
A. Grigore (TU Delft - Electrical Engineering, Mathematics and Computer Science)
H.S. Hung – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
Ivan Kondyurin – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
Z. Li – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
M.A. Neerincx – Graduation committee member (TU Delft - Interactive Intelligence)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Hand gestures play a crucial role in communication, especially in social interactions. This research investigates the viability of using coding schemes to describe hand gestures and how accurately they can be classified in crowded environments by using fine-tuned visual transformers such as VideoMAE. The dataset used during training is based on the Conflab dataset and contains top-view video recordings of social interactions in a crowded social setting. The videos are manually annotated for gesture phases (preparation, hold, stroke, recovery) and gesture units. The two classifiers obtain high accuracies after fine tuning, with an overall accuracy of 95% for the gesture phase classification and 93% for classifying whether a clip is a gesture unit or not. These findings suggest that the proposed approach is effective in crowded environments and can be adapted for real-time applications.