Survey of Interrater Agreement in Automatic Affect Prediction for Speech Emotion Recognition

A systematic review

Bachelor thesis (2024)

Authors

O.A.H. Wezenaar Electrical Engineering, Mathematics and Computer Science

Contributors

B.J.W. Dudzik Pattern Recognition and Bioinformatics - (mentor)

Catharine Oertel Interactive Intelligence - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:ea9f7550-ad8e-4ecb-a2c5-bb680b89ea30

Published Date

27-06-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Emotional datasets for automatic affect prediction usually employ raters to annotate emotions or verify the annotations. To ensure the reliability of these raters some use interrater agreement measures, to verify the degree to which annotators agree with each other on what they rate. This systematic review explores what kind of interrater agreement measures are used in emotional speech corpora. The affective states, the affect representation schemes, and the collection method of the datasets as well as the popularity of these measures were investigated. Scopus, IEEE Xplore, Web of Science, and ACM digital library were used to extract papers that describe the creation of datasets. 45 papers were included in the review. The review concludes that the interrater agreement measures used, are highly dependent on the collection method for speech and the affect representation schemes. It was found that there is no standardized way to measure interrater agreement. Datasets that use actors to record emulated emotions mostly use recognition rate as their interrater agreement measures. Datasets that use a dimensional representation scheme often compute the mean agreement of the raters and the standard deviation of that measure to check the interrater agreement. Datasets that do not use actors nor are dimensional use a plethora of different measures such as probabilistic computing of agreement, or majority agreement measures, but a large amount use no measures at all.

Files

Finalpaper_OscarWezenaar.pdf

(.pdf | 0.364 Mb)