Laughter in Motion: Pose-Based Detection Across Annotation Modalities in Natural Social Interactions

Investigating modality annotation impact for detecting laughter in the wild

Bachelor Thesis (2025)
Author(s)

V. Guenov (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

H.S. Hung – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

L. Li – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

S. Tan – Mentor (TU Delft - Interactive Intelligence)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Coordinates
52.002200, 4.373600
Graduation Date
05-11-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project', 'Multimodal Machine Learning Techniques for Analyzing Laughter and Drinking in Spontaneous Social Encounters']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Laughter is a complex multimodal behavior and one of the most essential aspects of social interactions. Although previous research has used both auditory and facial cues for laughter detection, these approaches are commonly afflicted with difficulties in noisy, occluded, and privacy-sensitive settings. This paper explores the potential of using body posture alone—captured through 2D keypoint estimation as a robust signal for automatic laughter detection in naturalistic settings. We create a machine learning pipeline using the ConfLab dataset, which segments pose data, extracts motion-based features, and trains Random Forest classifiers on various annotation modalities (audio-only, video-only, and audiovisual) and segmentation methods (fixed and variable length). We show that, while variable-length segmentation yields optimal performance, it leads to overfitting. On the other hand, fixed-duration segmentation with three-second windows and audiovisual annotations achieves a pragmatic compromise and reaches F1-scores (65\%) comparable to earlier efforts in ideal environments. Upper-body movement, especially head and arm motion, is seen to be salient cues to laughter via feature importance analysis. Annotation modality is also found to significantly affect both classification performance and relative pose feature importance. These findings demonstrate the viability of pose-based laughter detection and reveal how annotation choices shape model behavior, offering insights for affective computing in the wild.

Files

Guenov_-_Thesis.pdf
(pdf | 1.5 Mb)
License info not available