Impact of Annotation Modality on Label Quality and Model Performance in the Automatic Assessment of Laughter In-the-Wild

Vargas Quiros, J.D.; Cabrera Quiros, L.C.; Oertel, Catharine; Hung, H.S.

doi:10.1109/TAFFC.2023.3269003

Impact of Annotation Modality on Label Quality and Model Performance in the Automatic Assessment of Laughter In-the-Wild

Title

Impact of Annotation Modality on Label Quality and Model Performance in the Automatic Assessment of Laughter In-the-Wild

Author

Vargas Quiros, J.D. (TU Delft Intelligent Systems; TU Delft Pattern Recognition and Bioinformatics)
Cabrera Quiros, L.C. (TU Delft Pattern Recognition and Bioinformatics; Instituto Tecnologico de Costa Rica)
Oertel, Catharine (TU Delft Intelligent Systems; TU Delft Interactive Intelligence)
Hung, H.S. (TU Delft Intelligent Systems; TU Delft Pattern Recognition and Bioinformatics)

Department

Intelligent Systems

Date

2023

Abstract

Although laughter is known to be a multimodal signal, it is primarily annotated from audio. It is unclear how laughter labels may differ when annotated from modalities like video, which capture body movements and are relevant in in-the-wild studies. In this work we ask whether annotations of laughter are congruent across modalities, and compare the effect that labeling modality has on machine learning model performance. We compare annotations and models for laughter detection, intensity estimation, and segmentation, using a challenging in-the-wild conversational dataset with a variety of camera angles, noise conditions and voices. Our study with 48 annotators revealed evidence for incongruity in the perception of laughter and its intensity between modalities, mainly due to lower recall in the video condition. Our machine learning experiments compared the performance of modern unimodal and multi-modal models for different combinations of input modalities, training, and testing label modalities. In addition to the same input modalities rated by annotators (audio and video), we trained models with body acceleration inputs, robust to cross-contamination, occlusion and perspective differences. Our results show that performance of models with body movement inputs does not suffer when trained with video-acquired labels, despite their lower inter-rater agreement.

Subject

Action recognition
annotation
Annotations
Cameras
continuous annotation
Face recognition
Labeling
laughter
laughter detection
laughter intensity
Machine learning
mingling datasets
Physiology
Task analysis

To reference this document use:

http://resolver.tudelft.nl/uuid:80de5c16-8bd2-4a54-a823-e1032f4496e1

DOI

https://doi.org/10.1109/TAFFC.2023.3269003

Embargo date

2024-06-17

ISSN

1949-3045

Source

IEEE Transactions on Affective Computing, 15 (2), 519-534

Bibliographical note

Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Part of collection

Institutional Repository

Document type

journal article

Rights

Files

PDF

Impact_of_Annotation_Moda ... e-Wild.pdf

9.06 MB

Close viewer