Estimating intentions to speak in social settings

None, None

Estimating intentions to speak in social settings

Speaker intention estimation using accelerometer data and non-verbal vocal behaviour

Bachelor Thesis (2023)

Author(s)

W.J. Oudhuis (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

H.S. Hung – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

A.W.F.A.M. Elnouty – Graduation committee member (TU Delft - Computer Science & Engineering-Teaching Team)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Accelerometer Machine Learning Multimodal Intentions to speak Non-Verbal Communication

To reference this document use:

https://resolver.tudelft.nl/uuid:6d2a4cf3-e2bf-4e2f-8895-4a84315885f1

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

28-06-2023

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Computers having the ability to estimate intentions to speak can improve human-computer interaction. While plenty of research has been done on next-speaker prediction, they differ from intentions to speak since these rely only on the person themselves. Previous research was done on inferring intentions to speak using accelerometer data with some useful results. This paper expands on that research by adding non-verbal vocal behaviour as an additional modality, making the model multimodal. The model is trained on successful intentions to speak, and tested on successful and unsuccessful intentions to speak. Part of the dataset was annotated for unsuccessful intentions to speak and the signals in these annotations were analyzed. In conclusion, using non-verbal vocal behaviour is a much more reliable indicator of successful intentions to speak than accelerometer data. Using a combination of both improves the score slightly, but not significantly. Training on unsuccessful intentions to speak is likely needed to estimate these reliably. Additional modalities could be investigated to possibly improve the model further.

Files

CSE3000_Final_paper_WadedOudhu... (pdf)

(pdf | 0.411 Mb)

License info not available