U.K. Gadiraju | TU Delft Repository

From SERPs to Sound

How Search Engine Result Pages and AI-generated Podcasts Interact to Influence User Attitudes on Controversial Topics

Conference paper (2026) - Junjie Wang, Gaole He, Alisa Rieger, Ujwal Gadiraju

Compared to search engine result pages (SERPs), AI-generated podcasts represent a relatively new and relatively more passive modality of information consumption, delivering narratives in a naturally engaging format. As these two media increasingly converge in everyday information-seeking behavior, it is essential to explore how their interaction influences user attitudes, particularly in contexts involving controversial, value-laden, and often debated topics. Addressing this need, we aim to understand how information mediums of present-day SERPs and AI-generated podcasts interact to shape the opinions of users. To this end, through a controlled user study (N = 483), we investigated user attitudinal effects of consuming information via SERPs and AI-generated podcasts, focusing on how the sequence and modality of exposure shape user opinions. A majority of users in our study corresponded to attitude change outcomes, and we found an effect of sequence on attitude change. Our results further revealed a role of viewpoint bias and the degree of topic controversiality in shaping attitude change, although we found no effect of individual moderators. ...

Belief Updating and Delegation in Multi-Task Human-AI Interaction

Evidence from Controlled Simulations

Conference paper (2026) - Shreyan Biswas, Alexander Erlei, Ujwal Gadiraju

Large language models (LLMs) increasingly support heterogeneous tasks within a single interface, requiring users to form, update, and act upon beliefs about one system across domains with different reliability profiles. Understanding how such beliefs transfer across tasks and shape delegation is therefore critical for the design of multipurpose AI systems. We report a preregistered experiment (N = 240, 7,200 trials) in which participants interacted with a controlled AI simulation across grammar checking, travel planning, and visual question answering, each with fixed, domain-typical accuracy levels. Delegation was operationalized as a binary reliance decision - accepting the AI's output versus acting independently and belief dynamics were evaluated against Bayesian benchmarks. We find three main results. First, participants do not reset beliefs between tasks: priors in a new task depend on posteriors from the previous task, with a 10-point increase predicting a 3-4 point higher subsequent prior. Second, within tasks, belief updating follows the Bayesian direction but is substantially conservative, proceeding at roughly half the normative Bayesian rate. Third, delegation is driven primarily by subjective beliefs about AI accuracy rather than self-confidence, though confidence independently reduces reliance when beliefs are held constant. Together, these findings show that users form global, path-dependent expectations about multipurpose AI systems, update them conservatively, and rely on AI primarily based on subjective beliefs rather than objective performance. We discuss implications for expectation calibration, reliance design, and the risks of belief spillovers in deployed LLM-based interfaces. ...

The Data-Dollars Tradeoff

Privacy Harms vs. Economic Risk in Personalized AI Adoption

Conference paper (2026) - Alexander Erlei, Tahir Abbas, Kilian Bizer, Ujwal Gadiraju

Privacy concerns significantly impact AI adoption, yet little is known about how information environments shape user responses to data leak threats. We conducted a 2 × 3 between-subjects experiment (N = 610) examining how risk versus ambiguity about privacy leaks affects the adoption of AI personalization. Participants chose between standard and AI-personalized product baskets, with personalization requiring data sharing that could leak to pricing algorithms. Under risk (30% leak probability), we found no difference in AI adoption between privacy-threatening and neutral conditions (ca. 50% adoption). Under ambiguity (10-50% range), privacy threats significantly reduced adoption compared to neutral conditions. This effect holds for sensitive demographic data as well as anonymized preference data. Users systematically over-bid for privacy disclosure labels, suggesting strong demand for transparency institutions. Notably, privacy leak threats did not affect subsequent bargaining behavior with algorithms. Our findings indicate that ambiguity over data leaks, rather than only privacy preferences per se, drives avoidance behavior among users towards personalized AI. ...

When Life Gives You AI, Will You Turn It into A Market for Lemons? Understanding How Information Asymmetries about AI System Capabilities Affect Market Outcomes and Adoption

Conference paper (2026) - A. Erlei, F.M. Cau, R. Georgiev, S. Chethan Kumar, K. Bizer, U. Gadiraju

AI consumer markets are characterized by severe buyer-supplier market asymmetries. Complex AI systems can appear highly accurate while making costly errors or embedding hidden defects. While there have been regulatory efforts surrounding different forms of disclosure, large information gaps remain. This paper provides the first experimental evidence on the important role of information asymmetries and disclosure designs in shaping user adoption of AI systems. We systematically vary the density of low-quality AI systems and the depth of disclosure requirements in a simulated AI product market to gauge how people react to the risk of accidentally relying on a low-quality AI system. Then, we compare participants' choices to a rational Bayesian model, analyzing the degree to which partial information disclosure can improve AI adoption. Our results underscore the deleterious effects of information asymmetries on AI adoption, but also highlight the potential of partial disclosure designs to improve the overall efficiency of human decision-making. ...

Understanding, Mitigating, and Leveraging Cognitive Biases to Calibrate Trust in Evolving AI Systems

Conference paper (2026) - Saumya Pareek, Nattapat Boonprakong, Naja Kathrine Kollerup, Si Chen, Simo Hosio, Koji Yatani, Yi Chieh Lee, Ujwal Gadiraju, Niels van Berkel, Jorge Goncalves

Despite decades of advancements in Artificial Intelligence (AI), fostering appropriate trust in AI systems remains a challenge. Cognitive biases - systematic deviations from rational judgement - profoundly influence human decision-making, and reliance on such “mental shortcuts” can make AI systems appear more or less trustworthy than they really are, often undermining collaboration outcomes. As AI evolves with more sophisticated and persuasive natural language outputs, particularly through Generative AI (GenAI) and Large Language Models (LLMs), these biases may manifest in new and unpredictable ways, calling for their comprehensive examination. This workshop brings together diverse researchers from HCI, human-centred AI, cognitive psychology, interaction design, and related fields to collaboratively explore how cognitive biases influence trust calibration in human-AI interaction and establish a research agenda. We will explore how biases emerge across the human-AI interaction pipeline, what design strategies can mitigate or even harness these heuristics, and what methods are needed to study these dynamics effectively. Through a highly interactive 90-minute session, participants will map out open challenges, brainstorm tensions and solutions, chart future research directions, and share perspectives from their own diverse disciplinary lenses. Through this workshop, we aim to build a shared understanding of how cognitive biases influence trust in evolving AI systems, and derive a forward-looking, bias-aware research agenda that promotes appropriate trust in human-AI interaction. ...

AI CHAOS! 2nd Workshop on the Challenges for Human Oversight of AI Systems

Conference paper (2026) - Malik Khadar, Julia Cecil, Leon Van Der Neut, Nikola Banovic, Kevin Baum, Stevie Chancellor, Enrico Costanza, Ujwal Gadiraju, Harmanpreet Kaur, More Authors

As AI systems are increasingly adopted in high-stakes domains such as healthcare, autonomous driving, and criminal justice, their failures may threaten human safety and rights. Human oversight of AI systems is therefore critically important as a potential safeguard to prevent harmful consequences in high-risk AI applications. The global regulatory and policy landscape for AI governance remains understandably fragmented and diverse. While frameworks like the European AI Act require human oversight for high-risk AI systems, there is currently a lack of well-defined methodologies and conceptual clarity to operationalize such oversight effectively. Independent of policy and regulation, poorly designed oversight can create dangerous illusions of safety while obscuring accountability. This interdisciplinary workshop aims to bring together researchers from various disciplines, including AI, HCI, psychology, law, and policy, to address this critical gap. We will explore the following questions: (1) What are the greatest challenges to achieving effective human oversight of AI systems? (2) How can we design AI systems that enable meaningful human oversight? (3) How do we assign responsibilities to and support the various stakeholders involved in oversight? Through talks and interactive group discussions, participants will identify oversight challenges; examine stakeholder roles; discuss supporting tools, methods, and regulatory frameworks; and establish a collaborative research agenda. Our central goal is to further a roadmap that enables effective human oversight for the responsible deployment of AI in society. ...

Supporting adolescents’ mHealth needs

Qualitative and quantitative insights from a user survey of a mental health promoting app

Journal article (2026) - Esra Cemre Su de Groot, Lianne P. de Vries, Ujwal Gadiraju, Olya Kudina, Loes Keijsers, Manon H.J. Hillegers, Willem Paul Brinkman

While mental health apps can help to promote adolescents’ mental health, prevent mental health problems, and reduce symptoms, maintaining sufficient user engagement with these apps remains challenging. This is often caused by a mismatch between the needs and preferences of adolescents and what the apps offer. Therefore, we need a better understanding of (i) adolescents’ needs and preferences and (ii) potential differences based on user characteristics. To this end, we qualitatively and quantitatively analyzed a dataset describing the user experience of 1312 Dutch adolescents (12–25 years) from the general population after they interacted for several weeks with a gamified mHealth app (the Grow It! app) that aims to promote momentary emotional awareness, reflection, and adaptive coping. A total of 4833 free-text survey responses spanning five user experience survey questions were analyzed using an inductive and iterative coding process, while accounting for intercoder reliability. We used (i) a thematic analysis to identify adolescents’ needs and preferences related to the app, and (ii) an exploratory quantitative analysis of the subthemes to investigate potential differences in which needs and preferences were mentioned by adolescents based on demographics. Through our thematic analysis, we identified three overarching themes related to the app’s design: usability , psychological impact , and meaningful interactive features . Furthermore, we identified two overarching themes that related to the adolescents’ motivation to use the app: intrinsic (de)motivators , and social–environmental factors impacting usage . Each of these themes consisted of four subthemes. Our exploratory statistical analysis shed light on several differences in how frequently these subthemes were mentioned based on age, sex, and educational level. By synthesizing our insights, we identify five design implications that can help tailor future mHealth apps to adolescents’ needs and preferences. These include concrete suggestions to personalize self-monitoring, include actionable insights, align content with personal needs, implement meaningful interactive features (e.g., competitions, gamification, and social communication), and make apps appealing to the entire target group. ...

While mental health apps can help to promote adolescents’ mental health, prevent mental health problems, and reduce symptoms, maintaining sufficient user engagement with these apps remains challenging. This is often caused by a mismatch between the needs and preferences of adolescents and what the apps offer. Therefore, we need a better understanding of (i) adolescents’ needs and preferences and (ii) potential differences based on user characteristics. To this end, we qualitatively and quantitatively analyzed a dataset describing the user experience of 1312 Dutch adolescents (12–25 years) from the general population after they interacted for several weeks with a gamified mHealth app (the Grow It! app) that aims to promote momentary emotional awareness, reflection, and adaptive coping. A total of 4833 free-text survey responses spanning five user experience survey questions were analyzed using an inductive and iterative coding process, while accounting for intercoder reliability. We used (i) a thematic analysis to identify adolescents’ needs and preferences related to the app, and (ii) an exploratory quantitative analysis of the subthemes to investigate potential differences in which needs and preferences were mentioned by adolescents based on demographics. Through our thematic analysis, we identified three overarching themes related to the app’s design: usability , psychological impact , and meaningful interactive features . Furthermore, we identified two overarching themes that related to the adolescents’ motivation to use the app: intrinsic (de)motivators , and social–environmental factors impacting usage . Each of these themes consisted of four subthemes. Our exploratory statistical analysis shed light on several differences in how frequently these subthemes were mentioned based on age, sex, and educational level. By synthesizing our insights, we identify five design implications that can help tailor future mHealth apps to adolescents’ needs and preferences. These include concrete suggestions to personalize self-monitoring, include actionable insights, align content with personal needs, implement meaningful interactive features (e.g., competitions, gamification, and social communication), and make apps appealing to the entire target group.

AI CHAOS! 1st Workshop on the Challenges for Human Oversight of AI Systems

Conference paper (2026) - Tim Schrills, Patricia Kahr, Markus Langer, Harmanpreet Kaur, Ujwal Gadiraju

As AI systems are increasingly adopted in high-stakes domains such as healthcare, autonomous driving, and criminal justice, their failures may threaten human safety and rights. Human oversight of AI systems is therefore critically important, as a potential safeguard to prevent harmful consequences in high-risk AI applications. Although regulations like the European AI Act mandate human oversight for high-risk AI, we lack methodologies and conceptual clarity to implement it effectively. Independent of policy and regulation, poorly designed oversight can create dangerous illusions of safety while obscuring accountability. This interdisciplinary workshop aims to bring together researchers from various disciplines, including AI, HCI, psychology, law, and policy, to address this critical gap. We will explore the following questions — How can we design AI systems that enable meaningful human oversight? What methods effectively communicate system states and risks to human overseers? How do we ensure scalable and effective interventions? Through papers, talks, and interactive group discussions, participants will identify oversight challenges, examine stakeholder roles, discuss supporting tools, methods, regulatory frameworks, and establish a collaborative research agenda. Our central goal is to further a roadmap that enables effective human oversight for the responsible deployment of AI in society. ...

"even explanations will not help in trusting [this] fundamentally biased system"

A Predictive Policing Case-Study

Conference paper (2025) - Siddharth Mehrotra, Ujwal Gadiraju, Eva Bittner, Folkert Van Delden, Catholijn M. Jonker, Myrthe L. Tielman

In today's society, where Artificial Intelligence (AI) has gained a vital role, concerns regarding user's trust have garnered significant attention. The use of AI systems in high-risk domains have often led users to either under-trust it, potentially causing inadequate reliance or over-trust it, resulting in over-compliance. Therefore, users must maintain an appropriate level of trust. Past research has indicated that explanations provided by AI systems can enhance user understanding of when to trust or not trust the system. However, the utility of presentation of different explanations forms still remains to be explored especially in high-risk domains. Therefore, this study explores the impact of different explanation types (text, visual, and hybrid) and user expertise (retired police officers and lay users) on establishing appropriate trust in AI-based predictive policing. While we observed that the hybrid form of explanations increased the subjective trust in AI for expert users, it did not led to better decision-making. Furthermore, no form of explanations helped build appropriate trust. The findings of our study emphasize the importance of re-evaluating the use of explanations to build [appropriate] trust in AI based systems especially when the system's use is questionable. Finally, we synthesize potential challenges and policy recommendations based on our results to design for appropriate trust in high-risk based AI-based systems. ...

Unpacking Trust Dynamics in the LLM Supply Chain

An Empirical Exploration to Foster Trustworthy LLM Production & Use

Conference paper (2025) - Agathe Balayn, Mireia Yurrita, Fanny Rancourt, Fabio Casati, Ujwal Gadiraju

Research on trust in AI is limited to several trustors (e.g., end-users) and trustees (especially AI systems), and empirical explorations remain in laboratory settings, overlooking factors that impact trust relations in the real world. Here, we broaden the scope of research by accounting for the supply chains that AI systems are part of. To this end, we present insights from an in-situ, empirical, study of LLM supply chains. We conducted interviews with 71 practitioners, and analyzed their (collaborative) practices using the lens of trust drawing from literature in organizational psychology. Our work reveals complex trust dynamics at the junctions of the chains, with interactions between diverse technical artifacts, individuals, or organizations. These junctions might constitute terrain for uncalibrated reliance when trustors lack supply chain knowledge or power dynamics are at play. Our findings bear implications for AI researchers and policymakers to promote AI governance that fosters calibrated trust. ...

Plan-Then-Execute

An Empirical Study of User Trust and Team Performance When Using LLM Agents As A Daily Assistant

Conference paper (2025) - Gaole He, Gianluca Demartini, Ujwal Gadiraju

Since the explosion in popularity of ChatGPT, large language models (LLMs) have continued to impact our everyday lives. Equipped with external tools that are designed for a specific purpose (e.g., for flight booking or an alarm clock), LLM agents exercise an increasing capability to assist humans in their daily work. Although LLM agents have shown a promising blueprint as daily assistants, there is a limited understanding of how they can provide daily assistance based on planning and sequential decision making capabilities. We draw inspiration from recent work that has highlighted the value of g'LLM-modulo' setups in conjunction with humans-in-the-loop for planning tasks. We conducted an empirical study (N = 248) of LLM agents as daily assistants in six commonly occurring tasks with different levels of risk typically associated with them (e.g., flight ticket booking and credit card payments). To ensure user agency and control over the LLM agent, we adopted LLM agents in a plan-then-execute manner, wherein the agents conducted step-wise planning and step-by-step execution in a simulation environment. We analyzed how user involvement at each stage affects their trust and collaborative team performance. Our findings demonstrate that LLM agents can be a double-edged sword - (1) they can work well when a high-quality plan and necessary user involvement in execution are available, and (2) users can easily mistrust the LLM agents with plans that seem plausible. We synthesized key insights for using LLM agents as daily assistants to calibrate user trust and achieve better overall task outcomes. Our work has important implications for the future design of daily assistants and human-AI collaboration with LLM agents. ...

Identifying Algorithmic Decision Subjects' Needs for Meaningful Contestability

Journal article (2025) - Mireia Yurrita, Himanshu Verma, Agathe Balayn, Kars Alfrink, Ujwal Gadiraju, Alessandro Bozzon

Contestability has been proposed as a key element in designing algorithmic decision-making processes that safeguard decision subjects' rights to dignity and autonomy. However, little is known about how contestability can be operationalized based on decision subjects' needs and preferences. We address this research gap by identifying decision subjects' information and procedural needs for enacting meaningful contestability. To this end, we chose an illegal holiday rental detection scenario as our case; a high-risk decision-making process in the public sector. We conducted 21 semi-structured interviews with citizens with experience renting their homes out and different levels of AI literacy. We found that decision subjects request interventions that facilitate (1) cooperation in sense-making, (2) support in contestation acts, and (3) appropriate responsibility attribution. Our results highlight the cooperative work behind contestability, and motivate future efforts to structure individual and collective action, to personalize explanations for contestability, and to open up sites of contestation in AI pipelines. ...

Towards Effective Human Intervention in Algorithmic Decision-Making

Understanding the Effect of Decision-Makers' Configuration on Decision-Subjects' Fairness Perceptions

Conference paper (2025) - Mireia Yurrita, Himanshu Verma, Agathe Balayn, Ujwal Gadiraju, Sylvia C. Pont, Alessandro Bozzon

Human intervention is claimed to safeguard decision-subjects’ rights in algorithmic decision-making and contribute to their fairness perceptions. However, how decision-subjects perceive hybrid decision-maker configurations (i.e., combining humans and algorithms) is unclear. We address this gap through a mixed-methods study in an algorithmic policy enforcement context. Through qualitative interviews (Study 1; N1 = 21), we identify three characteristics (i.e., decision-maker’s profile, model type, input data provenance) that affect how decision-subjects perceive decision-makers’ ability, benevolence, and integrity (ABI). Through a quantitative study (Study 2; N2 = 223), we then systematically evaluate the individual and combined effects of these characteristics on decision-subjects’ perceptions towards decision-makers, and fairness perceptions. We found that only decision-maker’s profile contributes to perceived ability, benevolence, and integrity. Interestingly, the effect of decision-maker’s profile on fairness perceptions was mediated by perceived ability and integrity. Our findings have design implications for ensuring effective human intervention as a protection against harmful algorithmic decisions. ...

Making the Switch

Towards Intelligent Integration of Gestures As an Input Modality for Microtask Crowdsourcing

Conference paper (2025) - Garrett Allen, Ujwal Gadiraju

Human input is pivotal in building AI systems. Aiding the gathering of high-quality and representative human input on demand, microtask crowdsourcing platforms have thrived. Despite the benefits available, the lack of health provisions, safeguards, and existing practices threaten the sustainability of crowd work. Prior work investigated the usefulness of a dual-purpose input modality of ergonomically-informed gestures across different microtasks, finding that gestures as inputs offer a realistic trade-off between worker accuracy and potential short to long-term health benefits. However, little is understood about the effect of switching input modalities from one task to another on worker experiences and task-related outcomes. Addressing this research and empirical gap, we conducted a between-subjects study (N = 717) with varying sequences of input modalities across 16 experimental conditions to systematically understand the effect of switching input modalities. We found that the order of the input modality can influence the time it takes to complete tasks but does not affect accuracy. Further, the cognitive load perceived by workers was not significantly different between conditions. Our findings hint that ergonomically informed gestures can be effectively intertwined with conventional input modalities without a detrimental impact on worker experiences and quality-related outcomes. Our work has important implications for the design of human-centered crowdsourcing platforms that cater to worker health and wellbeing. ...

HealthInsights

An Online Conversational Survey for Understanding Worker Health in Crowdsourcing Platforms

Conference paper (2025) - Sihang Qiu, Ujwal Gadiraju, Xiaolong Zheng

Crowdsourcing marketplaces have gradually flourished over the last decade. With the growing landscape of online work in general, and the rise of paid microtask crowdsourcing in particular, the health and wellbeing of crowd workers has become an important concern. In this paper, we present an online conversational survey, named HealthInsights, for understanding the status quo of workers’ health-related background, physical health, mental health, and their needs. We carried out a study on two popular platforms - Mechanical Turk and Prolific. Results show that the survey has acceptable reliability and validity. We found that workers across these platforms reported similar health-related issues, but also exhibited certain differences. Based on our findings, we argue that crowdsourcing platforms, task requesters, and academic researchers need to take the collective responsibility of creating better work environments. Our work has important implications on task and workflow design that are centered around worker health on crowdsourcing platforms. ...

Is Conversational XAI All You Need?

Human-AI Decision Making With a Conversational XAI Assistant

Conference paper (2025) - Gaole He, Nilay Aishwarya, Ujwal Gadiraju

Explainable artificial intelligence (XAI) methods are being proposed to help interpret and understand how AI systems reach specific predictions. Inspired by prior work on conversational user interfaces, we argue that augmenting existing XAI methods with conversational user interfaces can increase user engagement and boost user understanding of the AI system. In this paper, we explored the impact of a conversational XAI interface on users’ understanding of the AI system, their trust, and reliance on the AI system. In comparison to an XAI dashboard, we found that the conversational XAI interface can bring about a better understanding of the AI system among users and higher user trust. However, users of both the XAI dashboard and conversational XAI interfaces showed clear over-reliance on the AI system. Enhanced conversations powered by large language model (LLM) agents amplified over-reliance. Based on our findings, we reason that the potential cause of such over-reliance is the illusion of explanatory depth that is concomitant with both XAI interfaces. Our findings have important implications for designing effective conversational XAI interfaces to facilitate appropriate reliance and improve human-AI collaboration. ...

Should We Tailor the Talk? Understanding the Impact of Conversational Styles on Preference Elicitation in Conversational Recommender Systems

Conference paper (2025) - Ivica Kostric, Krisztian Balog, Ujwal Gadiraju

Conversational recommender systems (CRSs) provide users with an interactive means to express preferences and receive real-time personalized recommendations. The success of these systems is heavily influenced by the preference elicitation process. While existing research mainly focuses on what questions to ask during preference elicitation, there is a notable gap in understanding what role broader interaction patterns - including tone, pacing, and level of proactiveness - play in supporting users in completing a given task. This study investigates the impact of different conversational styles on preference elicitation, task performance, and user satisfaction with CRSs. We conducted a controlled experiment in the context of scientific literature recommendation, contrasting two distinct conversational styles - high involvement (fast-paced, direct, and proactive with frequent prompts) and high considerateness (polite and accommodating, prioritizing clarity and user comfort) - alongside a flexible experimental condition where users could switch between the two. Our results indicate that adapting conversational strategies based on user expertise and allowing flexibility between styles can enhance both user satisfaction and the effectiveness of recommendations in CRSs. Overall, our findings hold important implications for the design of future CRSs. ...

DECI

The 3rd Tutorial on Designing Effective Conversational Interfaces

Conference paper (2025) - Ujwal Gadiraju, Kuldeep Yadav

Advances in generative AI and the widespread proliferation of LLM-based applications have created a number of opportunities for designing effective and intelligent human-AI interfaces. Conversational User Interfaces (CUIs) have enabled humans to interact with machines more naturally across several domains and applications. People are increasingly familiar with conversational interactions mediated by technology due to the widespread use of mobile technologies, social networks, pervasive computing, and the rapid adoption of large language models that power conversational agents. Based on the recent advances in conversational AI, due to the proliferation of LLMs, there are clear signs that the future of human-computer interaction will have a significant conversational component. In the context of ever-lowering barriers to accessibility to technologies, digital applications, and generative AI, this tutorial will showcase the benefits of employing conversational interfaces for human-AI decision making, health and well-being, and crowd computing. We will discuss the potential of conversational interfaces in facilitating and mediating people’s interactions with AI systems and the opportunities and challenges that lie at this intersection from the broad standpoint of intelligent user interfaces. This third incarnation of this tutorial will include interactive elements and discussions, providing participants with practical insights to inform the design of effective conversational interfaces. ...

Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different Languages

Conference paper (2025) - Shreyan Biswas, Alexander Erlei, Ujwal Gadiraju

Recent advances in generative AI have precipitated a proliferation of novel writing assistants. These systems typically rely on multilingual large language models (LLMs), providing globalized workers the ability to revise or create diverse forms of content in different languages. However, there is substantial evidence indicating that the performance of multilingual LLMs varies between languages. Users who employ writing assistance for multiple languages are therefore susceptible to disparate output quality. Importantly, recent research has shown that people tend to generalize algorithmic errors across independent tasks, violating the behavioral axiom of choice independence. In this paper, we analyze whether user utilization of novel writing assistants in a charity advertisement writing task is affected by the AI's performance in a second language. Furthermore, we quantify the extent to which these patterns translate into the persuasiveness of generated charity advertisements, as well as the role of peoples' beliefs about LLM utilization for their donation choices. Our results provide evidence that writers who engage with an LLM-based writing assistant violate choice independence, as prior exposure to a Spanish LLM reduces subsequent utilization of an English LLM. While these patterns do not affect the aggregate persuasiveness of the generated advertisements, people's beliefs about the source of an advertisement (human versus AI) do. In particular, Spanish-speaking female participants who believed that they read an AI-generated advertisement strongly adjusted their donation behavior downwards. Furthermore, people are generally not able to adequately differentiate between human-generated and LLM-generated ads. Our work has important implications on the design, development, integration, and adoption of multilingual LLMs as assistive agents - particularly in writing tasks. ...

The State of Pilot Study Reporting in Crowdsourcing

A Reflection on Best Practices and Guidelines

Journal article (2024) - Jonas Oppenlaender, Tahir Abbas, Ujwal Gadiraju

Pilot studies are an essential cornerstone of the design of crowdsourcing campaigns, yet they are often only mentioned in passing in the scholarly literature. A lack of details surrounding pilot studies in crowdsourcing research hinders the replication of studies and the reproduction of findings, stalling potential scientific advances. We conducted a systematic literature review on the current state of pilot study reporting at the intersection of crowdsourcing and HCI research. Our review of ten years of literature included 171 articles published in the proceedings of the Conference on Human Computation and Crowdsourcing (AAAI HCOMP) and the ACM Digital Library. We found that pilot studies in crowdsourcing research (i.e., crowd pilot studies) are often under-reported in the literature. Important details, such as the number of workers and rewards to workers, are often not reported. On the basis of our findings, we reflect on the current state of practice and formulate a set of best practice guidelines for reporting crowd pilot studies in crowdsourcing research. We also provide implications for the design of crowdsourcing platforms and make practical suggestions for supporting crowd pilot study reporting. ...