S. Qiu
Please Note
33 records found
1
HealthInsights
An Online Conversational Survey for Understanding Worker Health in Crowdsourcing Platforms
Crowdsourcing marketplaces have gradually flourished over the last decade. With the growing landscape of online work in general, and the rise of paid microtask crowdsourcing in particular, the health and wellbeing of crowd workers has become an important concern. In this paper, we present an online conversational survey, named HealthInsights, for understanding the status quo of workers’ health-related background, physical health, mental health, and their needs. We carried out a study on two popular platforms - Mechanical Turk and Prolific. Results show that the survey has acceptable reliability and validity. We found that workers across these platforms reported similar health-related issues, but also exhibited certain differences. Based on our findings, we argue that crowdsourcing platforms, task requesters, and academic researchers need to take the collective responsibility of creating better work environments. Our work has important implications on task and workflow design that are centered around worker health on crowdsourcing platforms.
Perspective
Leveraging Human Understanding for Identifying and Characterizing Image Atypicality
High-quality data plays a vital role in developing reliable image classification models. Despite that, what makes an image difficult to classify remains an unstudied topic. This paper provides a first-of-its-kind, model-agnostic characterization of image atypicality based on human understanding. We consider the setting of image classification "in the wild", where a large number of unlabeled images are accessible, and introduce a scalable and effective human computation approach for proactive identification and characterization of atypical images. Our approach consists of i) an image atypicality identification and characterization task that presents to the human worker both a local view of visually similar images and a global view of images from the class of interest and ii) an automatic image sampling method that selects a diverse set of atypical images based on both visual and semantic features. We demonstrate the effectiveness and cost-efficiency of our approach through controlled crowdsourcing experiments and provide a characterization of image atypicality based on human annotations of 10K images. We showcase the utility of the identified atypical images by testing state-of-the-art image classification services against such images and provide an in-depth comparative analysis of the alignment between human- and machine-perceived image atypicality. Our findings have important implications for developing and deploying reliable image classification systems.
Great Chain of Agents
The Role of Metaphorical Representation of Agents in Conversational Crowdsourcing
Conversational agents are being widely adopted across several domains to serve a variety of purposes ranging from providing intelligent assistance to companionship. Recent literature has shown that users develop intuitive folk theories and a metaphorical understanding of conversational agents (CAs) due to the lack of a mental model of the agents. However, investigation of metaphorical agent representation in the HCI community has mainly focused on the human level, despite non-human metaphors for agents being prevalent in the real world. We adopted Lakoff and Turner's 'Great Chain of Being' framework to systematically investigate the impact of using non-human metaphors to represent conversational agents on worker engagement in crowdsourcing marketplaces. We designed a text-based conversational agent that assists crowd workers in task execution. Through a between-subjects experimental study (N = 341), we explored how different human and non-human metaphors affect worker engagement, the perceived cognitive load of workers, intrinsic motivation, and their trust in the agents. Our findings bridge the gap of how users experience CAs with non-human metaphors in the context of conversational crowdsourcing.
To Trust or Not To Trust
How a Conversational Interface Affects Trust in a Decision Support System
Trust is an important component of human-AI relationships and plays a major role in shaping the reliance of users on online algorithmic decision support systems. With recent advances in natural language processing, text and voice-based conversational interfaces have provided users with new ways of interacting with such systems. Despite the growing applications of conversational user interfaces (CUIs), little is currently understood about the suitability of such interfaces for decision support and how CUIs inspire trust among humans engaging with decision support systems. In this work, we aim to address this gap and answer the following question: to what extent can a conversational interface build user trust in decision support systems in comparison to a conventional graphical user interface? To this end, we built a text-based conversational interface, and a conventional web-based graphical user interface. These served as the means for users to interact with an online decision support system to help them find housing, given a fixed set of constraints. To understand how the accuracy of the decision support system moderates user behavior and trust across the two interfaces, we considered an accurate and inaccurate system. We carried out a 2 × 2 between-subjects study (N = 240) on the Prolific crowdsourcing platform. Our findings show that the conversational interface was significantly more effective in building user trust and satisfaction in the online housing recommendation system when compared to the conventional web interface. Our results highlight the potential impact of conversational interfaces for trust development in decision support systems.
Unknown unknowns represent a major challenge in reliable image recognition. Existing methods mainly focus on unknown unknowns identification, leveraging human intelligence to gather images that are potentially difficult for the machine. To drive a deeper understanding of unknown unknowns and more effective identification and treatment, this paper focuses on unknown unknowns characterization. We introduce a human-in-the-loop, semantic analysis framework for characterizing unknown unknowns at scale. We engage humans in two tasks that specify what a machine should know and describe what it really knows, respectively, both at the conceptual level, supported by information extraction and machine learning interpretability methods. Data partitioning and sampling techniques are employed to scale out human contributions in handling large data. Through extensive experimentation on scene recognition tasks, we show that our approach provides a rich, descriptive characterization of unknown unknowns and allows for more effective and cost-efficient detection than the state of the art.
In popular crowdsourcing marketplaces like Amazon Mechanical Turk, crowd workers complete tasks posted by requesters in return for monetary rewards. Task requesters are solely responsible for deciding whether to accept or reject submitted work. Rejecting work can directly affect the monetary reward of corresponding workers, and indirectly influence worker qualifications and their future work opportunities in the marketplace. Unexpected or unwarranted rejections therefore result in negative emotions and reactions among workers. Despite the high prevalence of rejections in crowdsourcing marketplaces, little research has explored ways to mitigate the negative emotional repercussions of rejections on crowd workers. Addressing this important research gap, we investigate whether introducing self-reflection at different stages after task execution can alleviate the emotional toll of rejection decisions on crowd workers. Our work is inspired by prior studies in psychology that have shown that self-reflection on negative personal experiences can positively affect one's emotion. To this end, we carried out an experimental study investigating the impact of explicit self-reflection on the emotions of rejected crowd workers. Results show that allowing workers to self-reflect on their delivered work, especially before receiving a rejection, has a significantly positive impact on their self-reported emotions in terms of valence and dominance. Our findings reveal that introducing a self-reflection stage before workers receive acceptance or rejection decisions on submitted work, can help in positively influencing the emotions of a worker. These findings have important design implications towards fostering a healthier requester-worker relationship and contributing towards the sustainability of the crowdsourcing marketplace.
The future of crowd work has been identified to depend on worker satisfaction, but we lack a thorough understanding of how worker satisfaction can be increased in microtask crowdsourcing. Prior work has shown that one solution is to build tasks that are engaging. To facilitate engagement, two methods that have received attention in recent HCI literature are the use of video games and conversational interfaces. While these are largely different techniques, they aim for the same goal of reducing worker burden and increasing engagement in a task. On one hand, video games have huge motivation potential and translating game design elements for motivational purposes has shown positive effects. Recent work in games research has shown that the use of player avatars is effective in fostering interest, enjoyment, and other aspects pertaining to intrinsic motivation. On the other hand, conversational interfaces have been argued to have advantages over traditional GUIs due to facilitating a more human-like interaction. Conversational microtasking has recently been proposed to improve worker engagement in microtask marketplaces. The contexts of games and crowd work are underlined by the need to motivate and engage participants, yet the potential of using worker avatars to promote self-identification and improve worker satisfaction in microtask crowdsourcing has remained unexplored. Addressing this knowledge gap, we carried out a between-subject study involving 360 crowd workers. We investigated how worker avatars influence quality related outcomes of workers and their perceived experience, in conventional web and novel conversational interfaces. We equipped workers with the functionality of customizing their avatars, and selecting characterizations for their avatars, to understand whether identifying with an avatar can increase the motivation of workers. We found that using worker avatars with conversational interfaces can effectively reduce cognitive workload and increase worker retention. Our results indicate the occurrence of similarity and wishful avatar identification in crowdsourcing. Our findings have important implications in alleviating workers' perceived workload and on the design of crowdsourcing microtasks.
Up-to-date listings of retail stores and related building functions are challenging and costly to maintain. We introduce a novel method for automatically detecting, geo-locating, and classifying retail stores and related commercial functions, on the basis of storefronts extracted from street-level imagery. Specifically, we present a deep learning approach that takes storefronts from street-level imagery as input, and directly provides the geo-location and type of commercial function as output. Our method showed a recall of 89.05% and a precision of 88.22% on a real-world dataset of street-level images, which experimentally demonstrated that our approach achieves human-level accuracy while having a remarkable run-time efficiency compared to methods such as Faster Region-Convolutional Neural Networks (Faster R-CNN) and Single Shot Detector (SSD).
TickTalkTurk
Conversational crowdsourcing made easy
This demo presents TickTalkTurk, a tool that can assist task requesters in quickly deploying crowdsourcing tasks in a customizable conversational worker interface. The conversational worker interface can convey task instructions, deploy microtasks, and gather worker input in a dialogue-based workflow. The interface is implemented as a Web-based application, which makes it compatible with popular crowdsourcing platforms. The tool we developed is demonstrated through two microtask crowdsourcing examples with different task types. Results reveal that our conversational worker interface is capable of better engaging workers and analyzing workers performance.
The rise in popularity of conversational agents has enabled humans to interact with machines more naturally. Recent work has shown that crowd workers in microtask marketplaces can complete a variety of human intelligence tasks (HITs) using conversational interfaces with similar output quality compared to the traditional Web interfaces. In this paper, we investigate the effectiveness of using conversational interfaces to improve worker engagement in microtask crowdsourcing. We designed a text-based conversational agent that assists workers in task execution, and tested the performance of workers when interacting with agents having different conversational styles. We conducted a rigorous experimental study on Amazon Mechanical Turk with 800 unique workers, to explore whether the output quality, worker engagement and the perceived cognitive load of workers can be affected by the conversational agent and its conversational styles. Our results show that conversational interfaces can be effective in engaging workers, and a suitable conversational style has potential to improve worker engagement.
VirtualCrowd
A Simulation Platform for Microtask Crowdsourcing Campaigns
This demo presents VirtualCrowd, a simulation platform for crowdsourcing campaigns. The platform allows the design, configuration, step-by-step execution, and analysis of customized tasks, worker profiles, and crowdsourcing strategies. The platform will be demonstrated through a crowd-mapping example in two cities, which will highlight the utility of VirtualCrowd for complex crowdsourcing tasks in real world settings.
estimate the conversational style of a worker. Our experimental setup was designed to empirically observe how conversational styles of workers relate with quality-related outcomes. Results show that even a naive supervised classifier can predict the conversation style with high accuracy (80%), and crowd workers with an Involvement conversational style provided a significantly higher output quality, exhibited a higher user engagement and perceived less cognitive task load in comparison to their counterparts. Our findings have important implications on task design with respect to improving worker performance and their engagement in microtask crowdsourcing. ...
estimate the conversational style of a worker. Our experimental setup was designed to empirically observe how conversational styles of workers relate with quality-related outcomes. Results show that even a naive supervised classifier can predict the conversation style with high accuracy (80%), and crowd workers with an Involvement conversational style provided a significantly higher output quality, exhibited a higher user engagement and perceived less cognitive task load in comparison to their counterparts. Our findings have important implications on task design with respect to improving worker performance and their engagement in microtask crowdsourcing.