U.K. Gadiraju
Please Note
95 records found
1
From SERPs to Sound
How Search Engine Result Pages and AI-generated Podcasts Interact to Influence User Attitudes on Controversial Topics
Belief Updating and Delegation in Multi-Task Human-AI Interaction
Evidence from Controlled Simulations
Large language models (LLMs) increasingly support heterogeneous tasks within a single interface, requiring users to form, update, and act upon beliefs about one system across domains with different reliability profiles. Understanding how such beliefs transfer across tasks and shape delegation is therefore critical for the design of multipurpose AI systems. We report a preregistered experiment (N = 240, 7,200 trials) in which participants interacted with a controlled AI simulation across grammar checking, travel planning, and visual question answering, each with fixed, domain-typical accuracy levels. Delegation was operationalized as a binary reliance decision - accepting the AI's output versus acting independently and belief dynamics were evaluated against Bayesian benchmarks. We find three main results. First, participants do not reset beliefs between tasks: priors in a new task depend on posteriors from the previous task, with a 10-point increase predicting a 3-4 point higher subsequent prior. Second, within tasks, belief updating follows the Bayesian direction but is substantially conservative, proceeding at roughly half the normative Bayesian rate. Third, delegation is driven primarily by subjective beliefs about AI accuracy rather than self-confidence, though confidence independently reduces reliance when beliefs are held constant. Together, these findings show that users form global, path-dependent expectations about multipurpose AI systems, update them conservatively, and rely on AI primarily based on subjective beliefs rather than objective performance. We discuss implications for expectation calibration, reliance design, and the risks of belief spillovers in deployed LLM-based interfaces.
The Data-Dollars Tradeoff
Privacy Harms vs. Economic Risk in Personalized AI Adoption
Privacy concerns significantly impact AI adoption, yet little is known about how information environments shape user responses to data leak threats. We conducted a 2 × 3 between-subjects experiment (N = 610) examining how risk versus ambiguity about privacy leaks affects the adoption of AI personalization. Participants chose between standard and AI-personalized product baskets, with personalization requiring data sharing that could leak to pricing algorithms. Under risk (30% leak probability), we found no difference in AI adoption between privacy-threatening and neutral conditions (ca. 50% adoption). Under ambiguity (10-50% range), privacy threats significantly reduced adoption compared to neutral conditions. This effect holds for sensitive demographic data as well as anonymized preference data. Users systematically over-bid for privacy disclosure labels, suggesting strong demand for transparency institutions. Notably, privacy leak threats did not affect subsequent bargaining behavior with algorithms. Our findings indicate that ambiguity over data leaks, rather than only privacy preferences per se, drives avoidance behavior among users towards personalized AI.
AI consumer markets are characterized by severe buyer-supplier market asymmetries. Complex AI systems can appear highly accurate while making costly errors or embedding hidden defects. While there have been regulatory efforts surrounding different forms of disclosure, large information gaps remain. This paper provides the first experimental evidence on the important role of information asymmetries and disclosure designs in shaping user adoption of AI systems. We systematically vary the density of low-quality AI systems and the depth of disclosure requirements in a simulated AI product market to gauge how people react to the risk of accidentally relying on a low-quality AI system. Then, we compare participants' choices to a rational Bayesian model, analyzing the degree to which partial information disclosure can improve AI adoption. Our results underscore the deleterious effects of information asymmetries on AI adoption, but also highlight the potential of partial disclosure designs to improve the overall efficiency of human decision-making.
Despite decades of advancements in Artificial Intelligence (AI), fostering appropriate trust in AI systems remains a challenge. Cognitive biases - systematic deviations from rational judgement - profoundly influence human decision-making, and reliance on such “mental shortcuts” can make AI systems appear more or less trustworthy than they really are, often undermining collaboration outcomes. As AI evolves with more sophisticated and persuasive natural language outputs, particularly through Generative AI (GenAI) and Large Language Models (LLMs), these biases may manifest in new and unpredictable ways, calling for their comprehensive examination. This workshop brings together diverse researchers from HCI, human-centred AI, cognitive psychology, interaction design, and related fields to collaboratively explore how cognitive biases influence trust calibration in human-AI interaction and establish a research agenda. We will explore how biases emerge across the human-AI interaction pipeline, what design strategies can mitigate or even harness these heuristics, and what methods are needed to study these dynamics effectively. Through a highly interactive 90-minute session, participants will map out open challenges, brainstorm tensions and solutions, chart future research directions, and share perspectives from their own diverse disciplinary lenses. Through this workshop, we aim to build a shared understanding of how cognitive biases influence trust in evolving AI systems, and derive a forward-looking, bias-aware research agenda that promotes appropriate trust in human-AI interaction.
Supporting adolescents’ mHealth needs
Qualitative and quantitative insights from a user survey of a mental health promoting app
While mental health apps can help to promote adolescents’ mental health, prevent mental health problems, and reduce symptoms, maintaining sufficient user engagement with these apps remains challenging. This is often caused by a mismatch between the needs and preferences of adolescents and what the apps offer. Therefore, we need a better understanding of (i) adolescents’ needs and preferences and (ii) potential differences based on user characteristics. To this end, we qualitatively and quantitatively analyzed a dataset describing the user experience of 1312 Dutch adolescents (12–25 years) from the general population after they interacted for several weeks with a gamified mHealth app (the Grow It! app) that aims to promote momentary emotional awareness, reflection, and adaptive coping. A total of 4833 free-text survey responses spanning five user experience survey questions were analyzed using an inductive and iterative coding process, while accounting for intercoder reliability. We used (i) a thematic analysis to identify adolescents’ needs and preferences related to the app, and (ii) an exploratory quantitative analysis of the subthemes to investigate potential differences in which needs and preferences were mentioned by adolescents based on demographics. Through our thematic analysis, we identified three overarching themes related to the app’s design: usability , psychological impact , and meaningful interactive features . Furthermore, we identified two overarching themes that related to the adolescents’ motivation to use the app: intrinsic (de)motivators , and social–environmental factors impacting usage . Each of these themes consisted of four subthemes. Our exploratory statistical analysis shed light on several differences in how frequently these subthemes were mentioned based on age, sex, and educational level. By synthesizing our insights, we identify five design implications that can help tailor future mHealth apps to adolescents’ needs and preferences. These include concrete suggestions to personalize self-monitoring, include actionable insights, align content with personal needs, implement meaningful interactive features (e.g., competitions, gamification, and social communication), and make apps appealing to the entire target group.
As AI systems are increasingly adopted in high-stakes domains such as healthcare, autonomous driving, and criminal justice, their failures may threaten human safety and rights. Human oversight of AI systems is therefore critically important, as a potential safeguard to prevent harmful consequences in high-risk AI applications. Although regulations like the European AI Act mandate human oversight for high-risk AI, we lack methodologies and conceptual clarity to implement it effectively. Independent of policy and regulation, poorly designed oversight can create dangerous illusions of safety while obscuring accountability. This interdisciplinary workshop aims to bring together researchers from various disciplines, including AI, HCI, psychology, law, and policy, to address this critical gap. We will explore the following questions — How can we design AI systems that enable meaningful human oversight? What methods effectively communicate system states and risks to human overseers? How do we ensure scalable and effective interventions? Through papers, talks, and interactive group discussions, participants will identify oversight challenges, examine stakeholder roles, discuss supporting tools, methods, regulatory frameworks, and establish a collaborative research agenda. Our central goal is to further a roadmap that enables effective human oversight for the responsible deployment of AI in society.
"even explanations will not help in trusting [this] fundamentally biased system"
A Predictive Policing Case-Study
In today's society, where Artificial Intelligence (AI) has gained a vital role, concerns regarding user's trust have garnered significant attention. The use of AI systems in high-risk domains have often led users to either under-trust it, potentially causing inadequate reliance or over-trust it, resulting in over-compliance. Therefore, users must maintain an appropriate level of trust. Past research has indicated that explanations provided by AI systems can enhance user understanding of when to trust or not trust the system. However, the utility of presentation of different explanations forms still remains to be explored especially in high-risk domains. Therefore, this study explores the impact of different explanation types (text, visual, and hybrid) and user expertise (retired police officers and lay users) on establishing appropriate trust in AI-based predictive policing. While we observed that the hybrid form of explanations increased the subjective trust in AI for expert users, it did not led to better decision-making. Furthermore, no form of explanations helped build appropriate trust. The findings of our study emphasize the importance of re-evaluating the use of explanations to build [appropriate] trust in AI based systems especially when the system's use is questionable. Finally, we synthesize potential challenges and policy recommendations based on our results to design for appropriate trust in high-risk based AI-based systems.
Unpacking Trust Dynamics in the LLM Supply Chain
An Empirical Exploration to Foster Trustworthy LLM Production & Use
Research on trust in AI is limited to several trustors (e.g., end-users) and trustees (especially AI systems), and empirical explorations remain in laboratory settings, overlooking factors that impact trust relations in the real world. Here, we broaden the scope of research by accounting for the supply chains that AI systems are part of. To this end, we present insights from an in-situ, empirical, study of LLM supply chains. We conducted interviews with 71 practitioners, and analyzed their (collaborative) practices using the lens of trust drawing from literature in organizational psychology. Our work reveals complex trust dynamics at the junctions of the chains, with interactions between diverse technical artifacts, individuals, or organizations. These junctions might constitute terrain for uncalibrated reliance when trustors lack supply chain knowledge or power dynamics are at play. Our findings bear implications for AI researchers and policymakers to promote AI governance that fosters calibrated trust.
Plan-Then-Execute
An Empirical Study of User Trust and Team Performance When Using LLM Agents As A Daily Assistant
Since the explosion in popularity of ChatGPT, large language models (LLMs) have continued to impact our everyday lives. Equipped with external tools that are designed for a specific purpose (e.g., for flight booking or an alarm clock), LLM agents exercise an increasing capability to assist humans in their daily work. Although LLM agents have shown a promising blueprint as daily assistants, there is a limited understanding of how they can provide daily assistance based on planning and sequential decision making capabilities. We draw inspiration from recent work that has highlighted the value of g'LLM-modulo' setups in conjunction with humans-in-the-loop for planning tasks. We conducted an empirical study (N = 248) of LLM agents as daily assistants in six commonly occurring tasks with different levels of risk typically associated with them (e.g., flight ticket booking and credit card payments). To ensure user agency and control over the LLM agent, we adopted LLM agents in a plan-then-execute manner, wherein the agents conducted step-wise planning and step-by-step execution in a simulation environment. We analyzed how user involvement at each stage affects their trust and collaborative team performance. Our findings demonstrate that LLM agents can be a double-edged sword - (1) they can work well when a high-quality plan and necessary user involvement in execution are available, and (2) users can easily mistrust the LLM agents with plans that seem plausible. We synthesized key insights for using LLM agents as daily assistants to calibrate user trust and achieve better overall task outcomes. Our work has important implications for the future design of daily assistants and human-AI collaboration with LLM agents.
Contestability has been proposed as a key element in designing algorithmic decision-making processes that safeguard decision subjects' rights to dignity and autonomy. However, little is known about how contestability can be operationalized based on decision subjects' needs and preferences. We address this research gap by identifying decision subjects' information and procedural needs for enacting meaningful contestability. To this end, we chose an illegal holiday rental detection scenario as our case; a high-risk decision-making process in the public sector. We conducted 21 semi-structured interviews with citizens with experience renting their homes out and different levels of AI literacy. We found that decision subjects request interventions that facilitate (1) cooperation in sense-making, (2) support in contestation acts, and (3) appropriate responsibility attribution. Our results highlight the cooperative work behind contestability, and motivate future efforts to structure individual and collective action, to personalize explanations for contestability, and to open up sites of contestation in AI pipelines.
Towards Effective Human Intervention in Algorithmic Decision-Making
Understanding the Effect of Decision-Makers' Configuration on Decision-Subjects' Fairness Perceptions
Making the Switch
Towards Intelligent Integration of Gestures As an Input Modality for Microtask Crowdsourcing
HealthInsights
An Online Conversational Survey for Understanding Worker Health in Crowdsourcing Platforms
Crowdsourcing marketplaces have gradually flourished over the last decade. With the growing landscape of online work in general, and the rise of paid microtask crowdsourcing in particular, the health and wellbeing of crowd workers has become an important concern. In this paper, we present an online conversational survey, named HealthInsights, for understanding the status quo of workers’ health-related background, physical health, mental health, and their needs. We carried out a study on two popular platforms - Mechanical Turk and Prolific. Results show that the survey has acceptable reliability and validity. We found that workers across these platforms reported similar health-related issues, but also exhibited certain differences. Based on our findings, we argue that crowdsourcing platforms, task requesters, and academic researchers need to take the collective responsibility of creating better work environments. Our work has important implications on task and workflow design that are centered around worker health on crowdsourcing platforms.
Is Conversational XAI All You Need?
Human-AI Decision Making With a Conversational XAI Assistant
Conversational recommender systems (CRSs) provide users with an interactive means to express preferences and receive real-time personalized recommendations. The success of these systems is heavily influenced by the preference elicitation process. While existing research mainly focuses on what questions to ask during preference elicitation, there is a notable gap in understanding what role broader interaction patterns - including tone, pacing, and level of proactiveness - play in supporting users in completing a given task. This study investigates the impact of different conversational styles on preference elicitation, task performance, and user satisfaction with CRSs. We conducted a controlled experiment in the context of scientific literature recommendation, contrasting two distinct conversational styles - high involvement (fast-paced, direct, and proactive with frequent prompts) and high considerateness (polite and accommodating, prioritizing clarity and user comfort) - alongside a flexible experimental condition where users could switch between the two. Our results indicate that adapting conversational strategies based on user expertise and allowing flexibility between styles can enhance both user satisfaction and the effectiveness of recommendations in CRSs. Overall, our findings hold important implications for the design of future CRSs.
DECI
The 3rd Tutorial on Designing Effective Conversational Interfaces
Recent advances in generative AI have precipitated a proliferation of novel writing assistants. These systems typically rely on multilingual large language models (LLMs), providing globalized workers the ability to revise or create diverse forms of content in different languages. However, there is substantial evidence indicating that the performance of multilingual LLMs varies between languages. Users who employ writing assistance for multiple languages are therefore susceptible to disparate output quality. Importantly, recent research has shown that people tend to generalize algorithmic errors across independent tasks, violating the behavioral axiom of choice independence. In this paper, we analyze whether user utilization of novel writing assistants in a charity advertisement writing task is affected by the AI's performance in a second language. Furthermore, we quantify the extent to which these patterns translate into the persuasiveness of generated charity advertisements, as well as the role of peoples' beliefs about LLM utilization for their donation choices. Our results provide evidence that writers who engage with an LLM-based writing assistant violate choice independence, as prior exposure to a Spanish LLM reduces subsequent utilization of an English LLM. While these patterns do not affect the aggregate persuasiveness of the generated advertisements, people's beliefs about the source of an advertisement (human versus AI) do. In particular, Spanish-speaking female participants who believed that they read an AI-generated advertisement strongly adjusted their donation behavior downwards. Furthermore, people are generally not able to adequately differentiate between human-generated and LLM-generated ads. Our work has important implications on the design, development, integration, and adoption of multilingual LLMs as assistive agents - particularly in writing tasks.
The State of Pilot Study Reporting in Crowdsourcing
A Reflection on Best Practices and Guidelines