C.M. Jonker
Please Note
208 records found
1
From human teams to hybrid intelligence teams
Identifying, characterizing, and evaluating foundational quality attributes
Hybrid Intelligence (HI) is an emerging paradigm in which artificial intelligence (AI) augments human intelligence. The current literature lacks systematic models that guide the design and evaluation of HI systems. Further, discussions around HI primarily focus on technology, neglecting the holistic human-AI ensemble. In this paper, we take the initial steps toward the development of a quality model for characterizing and evaluating HI systems from a human-AI teams perspective. We first conducted a study investigating the adequacy of properties commonly associated with effective human teams to describe HI. The study features the insights of 50 HI researchers, and shows that various human team properties, including boundedness, interdependence, competency, purposefulness, initiative, normativity, and effectiveness, are important for HI systems. Based on these results, we developed a quality model for HI teams composed of seven high-level quality attributes, further refined into 16 specific ones. To evaluate the relevance and understanding of the proposed attributes, we conducted a second empirical investigation by staging competitions in which participants used the quality model to develop and analyze HI usage scenarios. Our analysis of 48 collected scenarios, which we openly release, confirms the proposed attributes’ relevance and highlights insights that emerge when designers consider the quality model in HI system design.
"even explanations will not help in trusting [this] fundamentally biased system"
A Predictive Policing Case-Study
In today's society, where Artificial Intelligence (AI) has gained a vital role, concerns regarding user's trust have garnered significant attention. The use of AI systems in high-risk domains have often led users to either under-trust it, potentially causing inadequate reliance or over-trust it, resulting in over-compliance. Therefore, users must maintain an appropriate level of trust. Past research has indicated that explanations provided by AI systems can enhance user understanding of when to trust or not trust the system. However, the utility of presentation of different explanations forms still remains to be explored especially in high-risk domains. Therefore, this study explores the impact of different explanation types (text, visual, and hybrid) and user expertise (retired police officers and lay users) on establishing appropriate trust in AI-based predictive policing. While we observed that the hybrid form of explanations increased the subjective trust in AI for expert users, it did not led to better decision-making. Furthermore, no form of explanations helped build appropriate trust. The findings of our study emphasize the importance of re-evaluating the use of explanations to build [appropriate] trust in AI based systems especially when the system's use is questionable. Finally, we synthesize potential challenges and policy recommendations based on our results to design for appropriate trust in high-risk based AI-based systems.
Knowing Me, Knowing AU
How Should We Design Agent-Mediated Mimicry?
A lack of self-awareness of communicative behaviours can lead to disadvantages in important interactions. Video recordings as a tool for self-observation have been widely adopted to initiate behaviour change and reflection. Seeing oneself in a recording can lead to negative affect. Forcing an external perspective can lead to cognitive dissonance. Avatars and virtual agents have the advantage that they can copy a human's behaviour while potentially avoiding this dissonance. To explore the design space of mimicking agents, we set up a user study where a video baseline is compared to agent-mediated conditions ranging from idle non-verbal behaviour to complete mimicry of the voice and face. We show that participants gain increased self-awareness from seeing themselves mediated through the virtual agent. We further discuss qualitative observations for the future design of systems that aid in self-reflection, and particularly note that partial mimicry seems to be less appreciated than full mimicry.
NegoLog
An Integrated Python-based Automated Negotiation Framework with Enhanced Assessment Components
The complexity of automated negotiation research calls for dedicated, user-friendly research frameworks that facilitate advanced analytics, comprehensive loggers, visualization tools, and auto-generated domains and preference profiles. This paper introduces NegoLog, a platform that provides advanced and customizable analysis modules to agent developers for exhaustive performance evaluation. NegoLog introduces an automated scenario and tournament generation tool in its Web-based user interface so that the agent developers can adjust the competitiveness and complexity of the negotiations. One of the key novelties of the NegoLog is an individual assessment of preference estimation models independent of the strategies.
Nudging human drivers via implicit communication by automated vehicles
Empirical evidence and computational cognitive modeling
Epistemic logic can be used to reason about statements such as ‘I know that you know that I know that φ ’. In this logic, and its extensions, it is commonly assumed that agents can reason about epistemic statements of arbitrary nesting depth. In contrast, empirical findings on Theory of Mind, the ability to (recursively) reason about mental states of others, show that human recursive reasoning capability has an upper bound. In the present paper we work towards resolving this disparity by proposing some elements of a logic of bounded Theory of Mind, built on Public Announcement Logic. Using this logic, and a statistical method called Random-Effects Bayesian Model Selection, we estimate the distribution of Theory of Mind levels in the participant population of a previous behavioral experiment. Despite not modeling stochastic behavior, we find that approximately three-quarters of participants’ decisions can be described using Theory of Mind. In contrast to previous empirical research, our models estimate the majority of participants to be second-order Theory of Mind users.
Appropriate trust is an important component of the interaction between people and AI systems, in that "inappropriate"trust can cause disuse, misuse, or abuse of AI. To foster appropriate trust in AI, we need to understand how AI systems can elicit appropriate levels of trust from their users. Out of the aspects that influence trust, this article focuses on the effect of showing integrity. In particular, this article presents a study of how different integrity-based explanations made by an AI agent affect the appropriateness of trust of a human in that agent. To explore this, (1) we provide a formal definition to measure appropriate trust, (2) present a between-subject user study with 160 participants who collaborated with an AI agent in such a task. In the study, the AI agent assisted its human partner in estimating calories on a food plate by expressing its integrity through explanations focusing on either honesty, transparency, or fairness. Our results show that (a) an agent who displays its integrity by being explicit about potential biases in data or algorithms achieved appropriate trust more often compared to being honest about capability or transparent about the decision-making process, and (b) subjective trust builds up and recovers better with honesty-like integrity explanations. Our results contribute to the design of agent-based AI systems that guide humans to appropriately trust them, a formal method to measure appropriate trust, and how to support humans in calibrating their trust in AI.
Presenting high-level arguments is a crucial task for fostering participation in online societal discussions. Current argument summarization approaches miss an important facet of this task-capturing diversity-which is important for accommodating multiple perspectives. We introduce three aspects of diversity: those of opinions, annotators, and sources. We evaluate approaches to a popular argument summarization task called Key Point Analysis, which shows how these approaches struggle to (1) represent arguments shared by few people, (2) deal with data from various sources, and (3) align with subjectivity in human-provided annotations. We find that both general-purpose LLMs and dedicated KPA models exhibit this behavior, but have complementary strengths. Further, we observe that diversification of training data may ameliorate generalization. Addressing diversity in argument summarization requires a mix of strategies to deal with subjectivity.