The Data Barrier to Lightweight Drinking Detection

None, None

The Data Barrier to Lightweight Drinking Detection

An Analysis of the Viability of Skeleton-Only Models on In-the-Wild Social Data

Bachelor Thesis (2025)

Author(s)

J.D. Tijssens (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

L. Li – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

S. Tan – Graduation committee member (TU Delft - Interactive Intelligence)

H.S. Hung – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

J. Urbano Merino – Graduation committee member (TU Delft - Multimedia Computing)

Faculty

Electrical Engineering, Mathematics and Computer Science

Gesture Recognition Action Recognition Pose Estimation Drinking Detection Skeleton Data In-the-Wild Lightweight Models Conflab Dataset Intake Gestures

To reference this document use:

https://resolver.tudelft.nl/uuid:4a54ebd8-003b-4e2a-bc13-2b81e31ef4d7

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

01-07-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project', 'Multimodal Machine Learning Techniques for Analyzing Laughter and Drinking in Spontaneous Social Encounters', 'co']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This research addresses the challenge of deploying real-time drinking gesture detection in messy, "in-the-wild" environments. We propose and evaluate two computationally inexpensive systems, one using a Random Forest classifier, another using a 1-Dimensional Convolutional Neural Net (1D-CNN) classifier. Both are trained on 2D skeleton data, or features derived from that solely that data. Tested on the Conflab social interaction dataset, our method is designed to handle sparse labels and significant data occlusion. This study reports on the performance of this light-weight, video-based approach, providing a benchmark for applicability in real-world health and human-computer-interaction applications where privacy and computational efficiency are important factors. Although we were unable to create a robust and reliable classifier (f1 of 0.07 and 0.03 respectively), this work shows that there is potential for future work to succeed (roc-auc’s of 0.63 and 0.55 respectively) and provides critical insights into pitfalls to avoid when designing similar systems.

Files

The_Data_Barrier_To_Lightweigh... (pdf)

(pdf | 0.959 Mb)

License info not available