Survey of Interrater Agreement in Automatic Affect Prediction for Speech Emotion Recognition

A systematic review

Bachelor Thesis (2024)
Author(s)

O.A.H. Wezenaar (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

B.J.W. Dudzik – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Catharine Oertel – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
27-06-2024
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Emotional datasets for automatic affect prediction usually employ raters to annotate emotions or verify the annotations. To ensure the reliability of these raters some use interrater agreement measures, to verify the degree to which annotators agree with each other on what they rate. This systematic review explores what kind of interrater agreement measures are used in emotional speech corpora. The affective states, the affect representation schemes, and the collection method of the datasets as well as the popularity of these measures were investigated. Scopus, IEEE Xplore, Web of Science, and ACM digital library were used to extract papers that describe the creation of datasets. 45 papers were included in the review. The review concludes that the interrater agreement measures used, are highly dependent on the collection method for speech and the affect representation schemes. It was found that there is no standardized way to measure interrater agreement. Datasets that use actors to record emulated emotions mostly use recognition rate as their interrater agreement measures. Datasets that use a dimensional representation scheme often compute the mean agreement of the raters and the standard deviation of that measure to check the interrater agreement. Datasets that do not use actors nor are dimensional use a plethora of different measures such as probabilistic computing of agreement, or majority agreement measures, but a large amount use no measures at all.

Files

Finalpaper_OscarWezenaar.pdf
(pdf | 0.364 Mb)
License info not available