This research addresses the challenge of deploying real-time drinking gesture detection in messy, "in-the-wild" environments. We propose and evaluate two computationally inexpensive systems, one using a Random Forest classifier, another using a 1-Dimensional Convolutional Neural
...
This research addresses the challenge of deploying real-time drinking gesture detection in messy, "in-the-wild" environments. We propose and evaluate two computationally inexpensive systems, one using a Random Forest classifier, another using a 1-Dimensional Convolutional Neural Net (1D-CNN) classifier. Both are trained on 2D skeleton data, or features derived from that solely that data. Tested on the Conflab social interaction dataset, our method is designed to handle sparse labels and significant data occlusion. This study reports on the performance of this light-weight, video-based approach, providing a benchmark for applicability in real-world health and human-computer-interaction applications where privacy and computational efficiency are important factors. Although we were unable to create a robust and reliable classifier (f1 of 0.07 and 0.03 respectively), this work shows that there is potential for future work to succeed (roc-auc’s of 0.63 and 0.55 respectively) and provides critical insights into pitfalls to avoid when designing similar systems.