CO

C.R.M.M. Oertel Genannt Bierbach

info

Please Note

28 records found

Journal article (2026) - Paul Raingeard de la Bletiere, Mark Neerincx, Rebecca Schaefer, Catharine Oertel
Music is widely used in human–computer interaction (HCI) to enhance engagement, sustain attention, and support cognitive stimulation. Yet its potential for deliberate mood regulation, particularly through personalized memory recall, remains largely unexplored. Music-evoked autobiographical memories (MEAMs) are often elicited by well-known, favorite songs, yielding stronger mood effects than music without personal memory associations. However, songs can also trigger distressing memories, and will never capture all positive personal memories. Since happy personal memories can enhance mood, broader methods for retrieval are needed. To address this, we introduce Constructed Music-Evoked Episodic Memories (CoMEEMs), a framework linking chosen episodic memories to music. By creating a personalized song-memory database, CoMEEMs enable autonomous mood regulation and communication in interactive systems, integrating memory cues—such as people and places—alongside mood congruence, to help choose songs with high mood regulatory impact. In an experiment with 71 Dutch and French adults, participants described 87 positive memories and received song recommendations based on associated people and places, with and without mood matching. Results showed that song familiarity and genre were the strongest predictors of perceived fit, while valence, arousal, tempo, and lyrics played smaller roles. Mood congruence, especially in valence, significantly influenced song relevance. Participants emphasized the need for user input on emotional states and memory context. Based on these findings, we propose design guidelines to improve future music recommendation systems targeting memories. ...

Identifying, characterizing, and evaluating foundational quality attributes

Journal article (2026) - Davide Dell’Anna, Pradeep K. Murukannaiah, Mireia Yurrita, Bernd Dudzik, Davide Grossi, Catholijn M. Jonker, Catharine Oertel, Pınar Yolum
Hybrid Intelligence (HI) is an emerging paradigm in which artificial intelligence (AI) augments human intelligence. The current literature lacks systematic models that guide the design and evaluation of HI systems. Further, discussions around HI primarily focus on technology, neglecting the holistic human-AI ensemble. In this paper, we take the initial steps toward the development of a quality model for characterizing and evaluating HI systems from a human-AI teams perspective. We first conducted a study investigating the adequacy of properties commonly associated with effective human teams to describe HI. The study features the insights of 50 HI researchers, and shows that various human team properties, including boundedness, interdependence, competency, purposefulness, initiative, normativity, and effectiveness, are important for HI systems. Based on these results, we developed a quality model for HI teams composed of seven high-level quality attributes, further refined into 16 specific ones. To evaluate the relevance and understanding of the proposed attributes, we conducted a second empirical investigation by staging competitions in which participants used the quality model to develop and analyze HI usage scenarios. Our analysis of 48 collected scenarios, which we openly release, confirms the proposed attributes’ relevance and highlights insights that emerge when designers consider the quality model in HI system design. ...
Conference paper (2025) - Annika Kniele, Lucia Donatelli, Catharine Oertel, Piek Vossen
We design a task to help identify how an agent can engage with information in a meaningful way through dialogue to foster collaboration. Specifically, the task involves a human and an agent sharing their memories of past events with each other, resulting in diverse information about those events. In a pilot study, we explore to what extent an LLM can be used to classify memories from the different sources as overlapping, complementary or conflicting. Knowing which of these categories a piece of information falls into will aid the agent in how to address it in dialogue, for instance to ask for further information, to adopt a shared perspective, or to agree to disagree about a conflict. We find that the LLM especially struggles with distinguishing between complementary and conflicting information, and that differing opinions about what is and is not implied by the event descriptions lead to many disagreements between the LLM and our human annotators. In future work, we will investigate to what extent conversing with the human can alleviate these issues. ...

How Should We Design Agent-Mediated Mimicry?

A lack of self-awareness of communicative behaviours can lead to disadvantages in important interactions. Video recordings as a tool for self-observation have been widely adopted to initiate behaviour change and reflection. Seeing oneself in a recording can lead to negative affect. Forcing an external perspective can lead to cognitive dissonance. Avatars and virtual agents have the advantage that they can copy a human's behaviour while potentially avoiding this dissonance. To explore the design space of mimicking agents, we set up a user study where a video baseline is compared to agent-mediated conditions ranging from idle non-verbal behaviour to complete mimicry of the voice and face. We show that participants gain increased self-awareness from seeing themselves mediated through the virtual agent. We further discuss qualitative observations for the future design of systems that aid in self-reflection, and particularly note that partial mimicry seems to be less appreciated than full mimicry. ...

Group-level Annotations and the Multimodal Modeling of Convergence and Divergence

Journal article (2025) - Navin Raj Prabhu, Maria Tsfasman, Catharine Oertel, Timo Gerkmann, Nale Lehmann-Willenbrock
Collaborating in a purposive group, whether face-to-face or virtually, involves continuously expressing emotions and interpreting those of other group members. As such, understanding group affect is essential to comprehending how groups interact and succeed in collaborative efforts. In this study, we move beyond individual-level affect and investigate group-level affect - a collective phenomenon that reflects the shared mood or emotions among group members at a particular moment. As the first in the literature, we gather annotations for group-level affective expressions in purposive group interactions using a fine-grained temporal approach (15 s windows) that also captures the inherent dynamics of this collective construct. To this end, we extensively train annotators and develop an annotation procedure specifically tuned to capture the entire scope of the group interaction from one interaction moment to the next. In addition, we model the ebb and flow of group affect by accounting for the underlying convergence (driven by emotional contagion) and divergence (resulting from emotional reactivity) of affective expressions among group members. To capture these interpersonal dynamics, we employ two approaches: (i) extracting synchrony-based handcrafted features from both audio and visual modalities, and (ii) introducing a novel, data-driven graph neural network to model interpersonal dynamics among group members. Our results highlight the advantages of the graph network over the handcrafted features in modeling group affect, while also emphasizing the importance of temporal modeling and incorporating multimodal cues. Additionally, our analysis of affective convergence and divergence reveals that groups tend to diverge in their social signals during neutral collective affect, while exhibiting convergence during more emotionally intense moments. These insights are drawn from comparative results across both modeling techniques. ...
Generative AI offers potential for educational support, but often lacks pedagogical grounding and awareness of the student’s learning context. Furthermore, researching student interactions with these tools within authentic learning environments remains challenging. To address this, we present JELAI, an open-source platform architecture designed to integrate fine-grained Learning Analytics (LA) with Large Language Model (LLM)-based tutoring directly within a Jupyter Notebook environment. JELAI employs a modular, containerized design featuring JupyterLab extensions for telemetry and chat, alongside a central middleware handling LA processing and context-aware LLM prompt enrichment. This architecture enables the capture of integrated code interaction and chat data, facilitating real-time, context-sensitive AI scaffolding and research into student behaviour. We describe the system’s design, implementation, and demonstrate its feasibility through system performance benchmarks and two proof-of-concept use cases illustrating its capabilities for logging multi-modal data, analysing help-seeking patterns, and supporting A/B testing of AI configurations. JELAI’s primary contribution is its technical framework, providing a flexible tool for researchers and educators to develop, deploy, and study LA-informed AI tutoring within the widely used Jupyter ecosystem. ...

The Impact of Capability Communication on User Behavior and Mental Model Alignment

Conference paper (2025) - Merle M. Reimann, Florian A. Kunneman, Catharine Oertel, Koen V. Hindriks
When a user interacts with a conversational agent for the first time, they may not be aware of the agent’s capabilities, leading to suboptimal use or interaction breakdowns. To avoid a mismatch with the actual capabilities, the agent’s capabilities have to be made transparent to the user. To investigate whether communication of an agent’s capabilities during interactions enhances transparency and improves the user’s mental model, we conducted a user study with 56 participants. Each participant had three speech-based interactions with an agent that communicated its capabilities or an agent that did not. Our results suggest that the communication led to a change in user behavior with significantly longer utterances. However, the users’ mental models of the agent’s capabilities were not significantly different between the conditions. Participants were able to significantly improve their knowledge of the agent’s capabilities by aligning their mental model over time in both conditions. ...

Enabling Value-Centric Long-Term Human-Agent Dialogue

When a human makes a decision, an observer may want to understand the reasons and motivations behind the decision. This understanding is important when IVAs are involved in contextual decision-making or coaching practices. To address this challenge, we propose that an agent’s understanding of its user should include knowledge of the user’s underlying values. Humans prioritise different values – sometimes contradictory – in a manner that depends on the context. We present a method where the agent and user build the required context-sensitive value model together. We use Schwartz’s value theory, which places individuals’ values into ten categories. In a between-subject experiment, with three sessions on different days, we elicit user values by presenting them with moral dilemmas in different contexts on the first day, refine the model by asking users to argue about contradictions on the second day, and let them reflect on the model that they have built together with the system on the third day. We find that users exposed to a value-aware condition are more likely to agree with the robot’s representations of their values post-reflection than those in a baseline. Participants also prioritise different values depending on the context, agreeing with previous findings. ...
Journal article (2024) - Merle M. Reimann, Florian A. Kunneman, Catharine Oertel, Koen V. Hindriks
As social robots see increasing deployment within the general public, improving the interaction with those robots is essential. Spoken language offers an intuitive interface for the human–robot interaction (HRI), with dialogue management (DM) being a key component in those interactive systems. Yet, to overcome current challenges and manage smooth, informative, and engaging interaction, a more structural approach to combining HRI and DM is needed. In this systematic review, we analyze the current use of DM in HRI and focus on the type of dialogue manager used, its capabilities, evaluation methods, and the challenges specific to DM in HRI. We identify the challenges and current scientific frontier related to the DM approach, interaction domain, robot appearance, physical situatedness, and multimodality. ...
Conference paper (2024) - M. Valle Torre, Catharine Oertel, M.M. Specht
Describing and analysing learner behaviour using sequential data and analysis is becoming more and more popular in Learning Analytics. Nevertheless, we found a variety of definitions of learning sequences, as well as choices regarding data aggregation and the methods implemented for analysis. Furthermore, sequences are used to study different educational settings and serve as a base for various interventions. In this literature review, the authors aim to generate an overview of these aspects to describe the current state of using sequence analysis in educational support and learning analytics. The 74 included articles were selected based on the criteria that they conduct empirical research on an educational environment using sequences of learning actions as the main focus of their analysis. The results enable us to highlight different learning tasks where sequences are analysed, identify data mapping strategies for different types of sequence actions, differentiate techniques based on purpose and scope, and identify educational interventions based on the outcomes of sequence analysis. ...
Foreword postscript (2024) - Hayley Hung, Catharine Oertel, Mohammad Soleymani, Theodora Chaspari, Hamdi Dibeklioglu, Jainendra Shukla, Khiet Truong
It is our great pleasure to welcome you to the 26th International Conference on Multimodal Interaction (ICMI 2024) in San Jose, Costa Rica. This is the first ICMI held in Latin America. ICMI is the premier international forum for human-centered multimodal interaction and social artificial intelligence (AI). Multimodal human-centered AI includes machine learning and computational techniques, such as representation learning, fusion, data and systems. The study of social interactions includes both human-human and human-machine interactions. A unique aspect of ICMI is its multidisciplinary nature which values both scientific discoveries and technical modeling of human interaction towards positive and societally beneficial applications. We are keen to showcase the recent advances in the field to the community and provide an inclusive forum to develop and exchange new ideas. To this end, a new initiative committee has been established to organize events for newcomers and facilitate inclusion. [...] ...
Although laughter is known to be a multimodal signal, it is primarily annotated from audio. It is unclear how laughter labels may differ when annotated from modalities like video, which capture body movements and are relevant in in-the-wild studies. In this work we ask whether annotations of laughter are congruent across modalities, and compare the effect that labeling modality has on machine learning model performance. We compare annotations and models for laughter detection, intensity estimation, and segmentation, using a challenging in-the-wild conversational dataset with a variety of camera angles, noise conditions and voices. Our study with 48 annotators revealed evidence for incongruity in the perception of laughter and its intensity between modalities, mainly due to lower recall in the video condition. Our machine learning experiments compared the performance of modern unimodal and multi-modal models for different combinations of input modalities, training, and testing label modalities. In addition to the same input modalities rated by annotators (audio and video), we trained models with body acceleration inputs, robust to cross-contamination, occlusion and perspective differences. Our results show that performance of models with body movement inputs does not suffer when trained with video-acquired labels, despite their lower inter-rater agreement. ...
This study investigates whether an agent-based Negotiation Training System (NTS) can teach women Strategic Empathy - a recently introduced negotiation strategy based on perspective taking - and whether this can improve their negotiation performance. Developed and tested through an interaction-based real-time experiment was a NTS that integrated instructions on how to utilize Strategic Empathy. Women in the experimental group showed significantly higher levels of perspective-taking compared to the control group, and their understanding and use of Strategic Empathy increased over time. Also, a significant positive effect was found of Strategic Empathy on women's self-efficacy. No significant positive effect was found of Strategic Empathy on persistence. The high cognitive load of the experiment and a lack of intrinsic motivation may have caused this finding. Overall, this work demonstrates the applicability of using NTS to teach Strategic Empathy, and its effectiveness for enhancing women's self-efficacy in salary negotiations. ...

Intelligent Conversational Agents

Journal article (2022) - Marcus Specht, Catharine Oertel

Predicting memorable moments in multi-party conversations through eye-gaze

Conference paper (2022) - Maria Tsfasman, Kristian Fenech, Morita Tarvirdians, Andras Lorincz, Catholijn Jonker, Catharine Oertel
When working in a group, it is essential to understand each other's viewpoints to increase group cohesion and meeting productivity. This can be challenging in teams: participants might be left misunderstood and the discussion could be going around in circles. To tackle this problem, previous research on group interactions has addressed topics such as dominance detection, group engagement, and group creativity. Conversational memory, however, remains a widely unexplored area in the field of multimodal analysis of group interaction. The ability to track what each participant or a group as a whole find memorable from each meeting would allow a system or agent to continuously optimise its strategy to help a team meet its goals. In the present paper, we therefore investigate what participants take away from each meeting and how it is reflected in group dynamics.As a first step toward such a system, we recorded a multimodal longitudinal meeting corpus (MEMO), which comprises a first-party annotation of what participants remember from a discussion and why they remember it. We investigated whether participants of group interactions encode what they remember non-verbally and whether we can use such non-verbal multimodal features to predict what groups are likely to remember automatically. We devise a coding scheme to cluster participants' memorisation reasons into higher-level constructs. We find that low-level multimodal cues, such as gaze and speaker activity, can predict conversational memorability. We also find that non-verbal signals can indicate when a memorable moment starts and ends. We could predict four levels of conversational memorability with an average accuracy of 44 %. We also showed that reasons related to participants' personal feelings and experiences are the most frequently mentioned grounds for remembering meeting segments. ...
In ongoing and consecutive conversations with persons, a social robot has to determine which aspects to remember and how to address them in the conversation. In the health domain, important aspects concern the health-related goals, the experienced progress (expressed sentiment) and the ongoing motivation to pursue them. Despite the progress in speech technology and conversational agents, most social robots lack a memory for such experience sharing. This paper presents the design and evaluation of a conversational memory for personalized behavior change support conversations on healthy nutrition via memory-based motivational rephrasing. The main hypothesis is that referring to previous sessions improves motivation and goal attainment, particularly when references vary. In addition, the paper explores how far motivational rephrasing affects user's perception of the conversational agent (the virtual Furhat). An experiment with 79 participants was conducted via Zoom, consisting of three conversation sessions. The results showed a significant increase in participants' change in motivation when multiple references to previous sessions were provided. ...
Journal article (2021) - Catharine Oertel, Patrik Jonell, Dimosthenis Kontogiorgos, Kenneth Funes Mora, Jean Marc Odobez, Joakim Gustafson
Listening to one another is essential to human-human interaction. In fact, we humans spend a substantial part of our day listening to other people, in private as well as in work settings. Attentive listening serves the function to gather information for oneself, but at the same time, it also signals to the speaker that he/she is being heard. To deduce whether our interlocutor is listening to us, we are relying on reading his/her nonverbal cues, very much like how we also use non-verbal cues to signal our attention. Such signaling becomes more complex when we move from dyadic to multi-party interactions. Understanding how humans use nonverbal cues in a multi-party listening context not only increases our understanding of human-human communication but also aids the development of successful human-robot interactions. This paper aims to bring together previous analyses of listener behavior analyses in human-human multi-party interaction and provide novel insights into gaze patterns between the listeners in particular. We are investigating whether the gaze patterns and feedback behavior, as observed in the human-human dialogue, are also beneficial for the perception of a robot in multi-party human-robot interaction. To answer this question, we are implementing an attentive listening system that generates multi-modal listening behavior based on our human-human analysis. We are comparing our system to a baseline system that does not differentiate between different listener types in its behavior generation. We are evaluating it in terms of the participant’s perception of the robot, his behavior as well as the perception of third-party observers. ...
Journal article (2021) - Alessia Eletta Coppi, Catharine Oertel, Alberto Cattaneo
Visual expertise is a fundamental proficiency in many vocations and many questions have risen on the topic, with studies looking at experts and novices differences’ in observation (e.g., radiologists) or at ways to help novices achieve visual expertise (e.g., through annotations). However, most of these studies focus on white-collar professions and overlook vocational ones. For example, observing is uttermost important for fashion designers who spend most of their professional time on visual tasks related to creating patterns and garments or performing alterations. Therefore, this study focuses on trying to convey a professional way to look at images by exposing apprentices to images annotated (e.g., circles) by experts and identifying if their gaze (e.g., fixation durations and gaze coverage) and verbalisations (i.e., images descriptions) are affected. The study was conducted with 38 apprentices that were exposed to sequential sets of images depicting shirts, first non-annotated (pre-test), then annotated for the experimental group and non-annotated for the control group (training 1 and training 2), and finally non-annotated (post-test). Also, in the pre and post-test and in training 2 apprentices had to verbally describe each image. Gaze was recorded with the Tobii X2–60 tracker. Results for fixation durations showed that the experimental group looked longer in the annotated part of the shirt in training 1 and in the shirt’s central part at post-test. However, the experimental group did not cover a significantly larger area of the shirt compared to control and verbalisations show no difference between the groups at post-test. ...

Supporting Design Space Exploration for Vocational Students

Journal article (2021) - Kevin Gonyop Kim, Catharine Oertel, Pierre Dillenbourg
Context: Exploring the design space is an important process in a design task. In this study, we considered design space exploration for the learners in vocational education and training (VET). The goal of the study was to investigate how they explore the design space while focusing on the effect of a graph-like interface on the learner's understanding of the design space. With florists as the target profession, we investigated how the apprentices explore design variations, what they would gain from such activity, and how we can support this process. Approach: We developed a web application called BloomGraph that allows learners to explore design variations. It provides a graph-based interface that enables the systematic variation of design. Using the BloomGraph application, we conducted an experimental study with 44 florist apprentices in Switzerland to investigate the effect of the graph-based interface which provides a structured way of exploring the design space. The experimental group was given the graph-based interface to explore design variations while the control group had a linear-based interface. We compared them in terms of the number of bouquets explored, time of exploration, diversity of bouquets explored, and the learning gain in terms of the understanding of the design space measured using pre and post-Tests. We also analyzed the strategies adopted by the participants for the graph navigation and the visual exploration behavior using the eye gaze data. Findings: Our analysis shows that the graph-based interface fosters a better understanding of the size of the design space and more efficient navigation towards a goal design in terms of the number of intermediate designs but with longer exploration of each intermediate design compared to the linear-based interface. Regarding the behavioral patterns in graph exploration, the participants who showed more strategic behavior in the design choices acquired a better understanding of the design space. Additionally, we trained a model that predicts the next choice of a learner using eye tracking data. It provides a reasonable accuracy that opens new possibilities for future studies. Conclusion: The findings of this study support the feasibility of design space exploration as a digital activity for VET learners and show how the learners can benefit from it. The contribution of the paper includes the validation of the idea with florist apprentices and the demonstration of how the process can be supported using a structured interface and the learner behavior analysis. This paper shows how a design exploration activity can provide an added value in the learning of an apprentice in a design-related VET system. ...
How human-like do conversational robots need to look to enable long-term human-robot conversation? One essential aspect of long-term interaction is a human's ability to adapt to the varying degrees of a conversational partner's engagement and emotions. Prosodically, this can be achieved through (dis)entrainment. While speech-synthesis has been a limiting factor for many years, restrictions in this regard are increasingly mitigated. These advancements now emphasise the importance of studying the effect of robot embodiment on human entrainment. In this study, we conducted a between-subjects online human-robot interaction experiment in an educational use-case scenario where a tutor was either embodied through a human or a robot face. 43 English-speaking participants took part in the study for whom we analysed the degree of acoustic-prosodic entrainment to the human or robot face, respectively. We found that the degree of subjective and objective perception of anthropomorphism positively correlates with acoustic-prosodic entrainment. ...