C.R.M.M. Oertel Genannt Bierbach
Please Note
28 records found
1
From human teams to hybrid intelligence teams
Identifying, characterizing, and evaluating foundational quality attributes
Hybrid Intelligence (HI) is an emerging paradigm in which artificial intelligence (AI) augments human intelligence. The current literature lacks systematic models that guide the design and evaluation of HI systems. Further, discussions around HI primarily focus on technology, neglecting the holistic human-AI ensemble. In this paper, we take the initial steps toward the development of a quality model for characterizing and evaluating HI systems from a human-AI teams perspective. We first conducted a study investigating the adequacy of properties commonly associated with effective human teams to describe HI. The study features the insights of 50 HI researchers, and shows that various human team properties, including boundedness, interdependence, competency, purposefulness, initiative, normativity, and effectiveness, are important for HI systems. Based on these results, we developed a quality model for HI teams composed of seven high-level quality attributes, further refined into 16 specific ones. To evaluate the relevance and understanding of the proposed attributes, we conducted a second empirical investigation by staging competitions in which participants used the quality model to develop and analyze HI usage scenarios. Our analysis of 48 collected scenarios, which we openly release, confirms the proposed attributes’ relevance and highlights insights that emerge when designers consider the quality model in HI system design.
We design a task to help identify how an agent can engage with information in a meaningful way through dialogue to foster collaboration. Specifically, the task involves a human and an agent sharing their memories of past events with each other, resulting in diverse information about those events. In a pilot study, we explore to what extent an LLM can be used to classify memories from the different sources as overlapping, complementary or conflicting. Knowing which of these categories a piece of information falls into will aid the agent in how to address it in dialogue, for instance to ask for further information, to adopt a shared perspective, or to agree to disagree about a conflict. We find that the LLM especially struggles with distinguishing between complementary and conflicting information, and that differing opinions about what is and is not implied by the event descriptions lead to many disagreements between the LLM and our human annotators. In future work, we will investigate to what extent conversing with the human can alleviate these issues.
Knowing Me, Knowing AU
How Should We Design Agent-Mediated Mimicry?
A lack of self-awareness of communicative behaviours can lead to disadvantages in important interactions. Video recordings as a tool for self-observation have been widely adopted to initiate behaviour change and reflection. Seeing oneself in a recording can lead to negative affect. Forcing an external perspective can lead to cognitive dissonance. Avatars and virtual agents have the advantage that they can copy a human's behaviour while potentially avoiding this dissonance. To explore the design space of mimicking agents, we set up a user study where a video baseline is compared to agent-mediated conditions ranging from idle non-verbal behaviour to complete mimicry of the voice and face. We show that participants gain increased self-awareness from seeing themselves mediated through the virtual agent. We further discuss qualitative observations for the future design of systems that aid in self-reflection, and particularly note that partial mimicry seems to be less appreciated than full mimicry.
Dynamics of Collective Group Affect
Group-level Annotations and the Multimodal Modeling of Convergence and Divergence
Collaborating in a purposive group, whether face-to-face or virtually, involves continuously expressing emotions and interpreting those of other group members. As such, understanding group affect is essential to comprehending how groups interact and succeed in collaborative efforts. In this study, we move beyond individual-level affect and investigate group-level affect - a collective phenomenon that reflects the shared mood or emotions among group members at a particular moment. As the first in the literature, we gather annotations for group-level affective expressions in purposive group interactions using a fine-grained temporal approach (15 s windows) that also captures the inherent dynamics of this collective construct. To this end, we extensively train annotators and develop an annotation procedure specifically tuned to capture the entire scope of the group interaction from one interaction moment to the next. In addition, we model the ebb and flow of group affect by accounting for the underlying convergence (driven by emotional contagion) and divergence (resulting from emotional reactivity) of affective expressions among group members. To capture these interpersonal dynamics, we employ two approaches: (i) extracting synchrony-based handcrafted features from both audio and visual modalities, and (ii) introducing a novel, data-driven graph neural network to model interpersonal dynamics among group members. Our results highlight the advantages of the graph network over the handcrafted features in modeling group affect, while also emphasizing the importance of temporal modeling and incorporating multimodal cues. Additionally, our analysis of affective convergence and divergence reveals that groups tend to diverge in their social signals during neutral collective affect, while exhibiting convergence during more emotionally intense moments. These insights are drawn from comparative results across both modeling techniques.
Transparent Conversational Agents
The Impact of Capability Communication on User Behavior and Mental Model Alignment
When a user interacts with a conversational agent for the first time, they may not be aware of the agent’s capabilities, leading to suboptimal use or interaction breakdowns. To avoid a mismatch with the actual capabilities, the agent’s capabilities have to be made transparent to the user. To investigate whether communication of an agent’s capabilities during interactions enhances transparency and improves the user’s mental model, we conducted a user study with 56 participants. Each participant had three speech-based interactions with an agent that communicated its capabilities or an agent that did not. Our results suggest that the communication led to a change in user behavior with significantly longer utterances. However, the users’ mental models of the agent’s capabilities were not significantly different between the conditions. Participants were able to significantly improve their knowledge of the agent’s capabilities by aligning their mental model over time in both conditions.
Memory with Meaning
Enabling Value-Centric Long-Term Human-Agent Dialogue
Although laughter is known to be a multimodal signal, it is primarily annotated from audio. It is unclear how laughter labels may differ when annotated from modalities like video, which capture body movements and are relevant in in-the-wild studies. In this work we ask whether annotations of laughter are congruent across modalities, and compare the effect that labeling modality has on machine learning model performance. We compare annotations and models for laughter detection, intensity estimation, and segmentation, using a challenging in-the-wild conversational dataset with a variety of camera angles, noise conditions and voices. Our study with 48 annotators revealed evidence for incongruity in the perception of laughter and its intensity between modalities, mainly due to lower recall in the video condition. Our machine learning experiments compared the performance of modern unimodal and multi-modal models for different combinations of input modalities, training, and testing label modalities. In addition to the same input modalities rated by annotators (audio and video), we trained models with body acceleration inputs, robust to cross-contamination, occlusion and perspective differences. Our results show that performance of models with body movement inputs does not suffer when trained with video-acquired labels, despite their lower inter-rater agreement.
Towards creating a conversational memory for long-term meeting support
Predicting memorable moments in multi-party conversations through eye-gaze
When working in a group, it is essential to understand each other's viewpoints to increase group cohesion and meeting productivity. This can be challenging in teams: participants might be left misunderstood and the discussion could be going around in circles. To tackle this problem, previous research on group interactions has addressed topics such as dominance detection, group engagement, and group creativity. Conversational memory, however, remains a widely unexplored area in the field of multimodal analysis of group interaction. The ability to track what each participant or a group as a whole find memorable from each meeting would allow a system or agent to continuously optimise its strategy to help a team meet its goals. In the present paper, we therefore investigate what participants take away from each meeting and how it is reflected in group dynamics.As a first step toward such a system, we recorded a multimodal longitudinal meeting corpus (MEMO), which comprises a first-party annotation of what participants remember from a discussion and why they remember it. We investigated whether participants of group interactions encode what they remember non-verbally and whether we can use such non-verbal multimodal features to predict what groups are likely to remember automatically. We devise a coding scheme to cluster participants' memorisation reasons into higher-level constructs. We find that low-level multimodal cues, such as gaze and speaker activity, can predict conversational memorability. We also find that non-verbal signals can indicate when a memorable moment starts and ends. We could predict four levels of conversational memorability with an average accuracy of 44 %. We also showed that reasons related to participants' personal feelings and experiences are the most frequently mentioned grounds for remembering meeting segments.
Listening to one another is essential to human-human interaction. In fact, we humans spend a substantial part of our day listening to other people, in private as well as in work settings. Attentive listening serves the function to gather information for oneself, but at the same time, it also signals to the speaker that he/she is being heard. To deduce whether our interlocutor is listening to us, we are relying on reading his/her nonverbal cues, very much like how we also use non-verbal cues to signal our attention. Such signaling becomes more complex when we move from dyadic to multi-party interactions. Understanding how humans use nonverbal cues in a multi-party listening context not only increases our understanding of human-human communication but also aids the development of successful human-robot interactions. This paper aims to bring together previous analyses of listener behavior analyses in human-human multi-party interaction and provide novel insights into gaze patterns between the listeners in particular. We are investigating whether the gaze patterns and feedback behavior, as observed in the human-human dialogue, are also beneficial for the perception of a robot in multi-party human-robot interaction. To answer this question, we are implementing an attentive listening system that generates multi-modal listening behavior based on our human-human analysis. We are comparing our system to a baseline system that does not differentiate between different listener types in its behavior generation. We are evaluating it in terms of the participant’s perception of the robot, his behavior as well as the perception of third-party observers.
How Florist Apprentices Explore Bouquet Designs
Supporting Design Space Exploration for Vocational Students
Context: Exploring the design space is an important process in a design task. In this study, we considered design space exploration for the learners in vocational education and training (VET). The goal of the study was to investigate how they explore the design space while focusing on the effect of a graph-like interface on the learner's understanding of the design space. With florists as the target profession, we investigated how the apprentices explore design variations, what they would gain from such activity, and how we can support this process. Approach: We developed a web application called BloomGraph that allows learners to explore design variations. It provides a graph-based interface that enables the systematic variation of design. Using the BloomGraph application, we conducted an experimental study with 44 florist apprentices in Switzerland to investigate the effect of the graph-based interface which provides a structured way of exploring the design space. The experimental group was given the graph-based interface to explore design variations while the control group had a linear-based interface. We compared them in terms of the number of bouquets explored, time of exploration, diversity of bouquets explored, and the learning gain in terms of the understanding of the design space measured using pre and post-Tests. We also analyzed the strategies adopted by the participants for the graph navigation and the visual exploration behavior using the eye gaze data. Findings: Our analysis shows that the graph-based interface fosters a better understanding of the size of the design space and more efficient navigation towards a goal design in terms of the number of intermediate designs but with longer exploration of each intermediate design compared to the linear-based interface. Regarding the behavioral patterns in graph exploration, the participants who showed more strategic behavior in the design choices acquired a better understanding of the design space. Additionally, we trained a model that predicts the next choice of a learner using eye tracking data. It provides a reasonable accuracy that opens new possibilities for future studies. Conclusion: The findings of this study support the feasibility of design space exploration as a digital activity for VET learners and show how the learners can benefit from it. The contribution of the paper includes the validation of the idea with florist apprentices and the demonstration of how the process can be supported using a structured interface and the learner behavior analysis. This paper shows how a design exploration activity can provide an added value in the learning of an apprentice in a design-related VET system.
How human-like do conversational robots need to look to enable long-term human-robot conversation? One essential aspect of long-term interaction is a human's ability to adapt to the varying degrees of a conversational partner's engagement and emotions. Prosodically, this can be achieved through (dis)entrainment. While speech-synthesis has been a limiting factor for many years, restrictions in this regard are increasingly mitigated. These advancements now emphasise the importance of studying the effect of robot embodiment on human entrainment. In this study, we conducted a between-subjects online human-robot interaction experiment in an educational use-case scenario where a tutor was either embodied through a human or a robot face. 43 English-speaking participants took part in the study for whom we analysed the degree of acoustic-prosodic entrainment to the human or robot face, respectively. We found that the degree of subjective and objective perception of anthropomorphism positively correlates with acoustic-prosodic entrainment.