M. Tsfasman
Please Note
5 records found
1
On the path towards training computer systems to predict such memorable moments, this dissertation first introduces a new dataset called the MeMo corpus (Chapter 2). It includes group video conversations along with direct reports from participants about which moments they remembered. The data was collected in a way that reflects real-life conversations, using repeated video calls and memory reports that are linked to specific moments in time.
The study in chapter 3 then asks whether affective signals, such as emotional tone or energy in a conversation, could help predict what people will remember. These kinds of emotional signals are often used in artificial intelligence systems. However, the results show that emotional signals alone are not enough to explain what people remember from a conversation.
Next, in chapter 4, the dissertation looks at other behavioural signs, such as where people were looking and who was speaking. These signals were found to be significantly linked with memory: for example, people tend to remember parts of a conversation where there was shared attention or dynamic speaking patterns. Using these signals, simple computer models were able to predict which parts of the conversation were more likely to be remembered. The study also looked at why people remembered certain moments and found that many of them were related to personal relevance or social connection.
This work shows that it is possible to build systems that recognise which parts of a conversation are more memorable. This can be useful for improving automatic meeting tools, personal assistants, and other technologies that support communication and augment memory.
...
On the path towards training computer systems to predict such memorable moments, this dissertation first introduces a new dataset called the MeMo corpus (Chapter 2). It includes group video conversations along with direct reports from participants about which moments they remembered. The data was collected in a way that reflects real-life conversations, using repeated video calls and memory reports that are linked to specific moments in time.
The study in chapter 3 then asks whether affective signals, such as emotional tone or energy in a conversation, could help predict what people will remember. These kinds of emotional signals are often used in artificial intelligence systems. However, the results show that emotional signals alone are not enough to explain what people remember from a conversation.
Next, in chapter 4, the dissertation looks at other behavioural signs, such as where people were looking and who was speaking. These signals were found to be significantly linked with memory: for example, people tend to remember parts of a conversation where there was shared attention or dynamic speaking patterns. Using these signals, simple computer models were able to predict which parts of the conversation were more likely to be remembered. The study also looked at why people remembered certain moments and found that many of them were related to personal relevance or social connection.
This work shows that it is possible to build systems that recognise which parts of a conversation are more memorable. This can be useful for improving automatic meeting tools, personal assistants, and other technologies that support communication and augment memory.
Dynamics of Collective Group Affect
Group-level Annotations and the Multimodal Modeling of Convergence and Divergence
Collaborating in a purposive group, whether face-to-face or virtually, involves continuously expressing emotions and interpreting those of other group members. As such, understanding group affect is essential to comprehending how groups interact and succeed in collaborative efforts. In this study, we move beyond individual-level affect and investigate group-level affect - a collective phenomenon that reflects the shared mood or emotions among group members at a particular moment. As the first in the literature, we gather annotations for group-level affective expressions in purposive group interactions using a fine-grained temporal approach (15 s windows) that also captures the inherent dynamics of this collective construct. To this end, we extensively train annotators and develop an annotation procedure specifically tuned to capture the entire scope of the group interaction from one interaction moment to the next. In addition, we model the ebb and flow of group affect by accounting for the underlying convergence (driven by emotional contagion) and divergence (resulting from emotional reactivity) of affective expressions among group members. To capture these interpersonal dynamics, we employ two approaches: (i) extracting synchrony-based handcrafted features from both audio and visual modalities, and (ii) introducing a novel, data-driven graph neural network to model interpersonal dynamics among group members. Our results highlight the advantages of the graph network over the handcrafted features in modeling group affect, while also emphasizing the importance of temporal modeling and incorporating multimodal cues. Additionally, our analysis of affective convergence and divergence reveals that groups tend to diverge in their social signals during neutral collective affect, while exhibiting convergence during more emotionally intense moments. These insights are drawn from comparative results across both modeling techniques.
The world seems different in a social context
A neural network analysis of human experimental data
How human-like do conversational robots need to look to enable long-term human-robot conversation? One essential aspect of long-term interaction is a human's ability to adapt to the varying degrees of a conversational partner's engagement and emotions. Prosodically, this can be achieved through (dis)entrainment. While speech-synthesis has been a limiting factor for many years, restrictions in this regard are increasingly mitigated. These advancements now emphasise the importance of studying the effect of robot embodiment on human entrainment. In this study, we conducted a between-subjects online human-robot interaction experiment in an educational use-case scenario where a tutor was either embodied through a human or a robot face. 43 English-speaking participants took part in the study for whom we analysed the degree of acoustic-prosodic entrainment to the human or robot face, respectively. We found that the degree of subjective and objective perception of anthropomorphism positively correlates with acoustic-prosodic entrainment.