Laughter detection in privacy-sensitive audio
M. Fregonara (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Hayley Hung – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
Jose David Vargas-Quiros – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
Jasmijn A. Baaijens – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
With the development of new technologies and approaches in the field of social signal processing, concerns regarding privacy around recording conversations have arised. One of the main ways to preserve the privacy of the speakers in recorded conversations consists of decimating said conversations, which consists of reducing the sample frequency and the frequency bandwidth of the audio. This theoretically makes the verbal content of the conversation (the words themselves) unintelligible, while still preserving other useful non-verbal social cues such as laughter, pitch modulation and frequency of speech, amongst others. However, this has not been experimentally verified. This research paper addresses this knowledge gap by exploring the performance of laughter detection machine learning models with decimated audio. An existing pre-trained state-of-the-art laughter detection model was employed and its performance was evaluated for a dataset of decimated audio with sample frequencies ranging from 300Hz to 44100Hz.