Estimating Intentions to Speak Using Body Postures in Social Interactions

Leveraging Different Machine Learning Techniques for Accurate Estimation of Intentions to Speak In-the-Wild

More Info
expand_more

Abstract

Everyone has the intention to speak sometimes. Allowing agents to estimate people's intention of speaking can increase conversation efficiency and engagement. The intention of speaking can be expressed by multiple modalities as social cues. In order to add value to existing accelerometer-based research, this research aims to build a model on body postures and explore how it performs on both successful and unsuccessful intention cases. The time segments of successful intentions are automatically generated and the segments of unsuccessful intentions are annotated in a small time period. The model uses poses extracted from the successful intention segments and evaluated on both successful and unsuccessful cases. It is shown that body posture is an effective modality to predict the intention while there are problems like visibility based on camera angles and lack of context while combining data from multiple angles. More modalities are to be added to enhance the model's generalisability and reliability.