Workshop on Human-in-the-loop Data Curation

None, None; None, None; None, None

Workshop on Human-in-the-loop Data Curation

Conference Paper (2022)

Author(s)

Gianluca Demartini (University of Queensland)

Jie Yang (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Shazia Sadiq (University of Queensland)

Research Group

Web Information Systems

DOI related publication

https://doi.org/10.1145/3511808.3557498 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:fedae44d-5345-477a-8271-9194a28647e8

More Info

expand_more

Publication Year

2022

Language

English

Research Group

Web Information Systems

Pages (from-to)

5161-5162

ISBN (electronic)

978-1-4503-9236-5

Event

31st ACM International Conference on Information and Knowledge Management, CIKM 2022 (2022-10-17 - 2022-10-21), Atlanta, United States

Downloads counter

309

Collections

Institutional Repository

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Although data quality is a long-standing and enduring problem, it has recently received a resurgence of attention due to the fast proliferation of data analytics, machine learning, and decision-support applications built upon the wide-scale availability and accessibility of (big) data. The success of such applications heavily relies on not only the quantity, but also the quality of data. Data curation, which may include annotation, cleaning, transformation, integration, etc., is a critical step to provide adequate assurances on the quality of analytics and machine learning results. Such data preparation activities are recognised as time and resource intensive for data scientists as data often comes with a number of challenges that need to be tackled before it can be used in practice. Data re-purposing and the resulting distance between design and use intentions of the data, is a fundamental issue behind many of these challenges. These challenges include a variety of data issues such as noise and outliers, incompleteness, representativeness or biases, heterogeneity of format or semantics, etc. Mishandling these challenges can lead to negative and sometimes damaging effects, especially in critical domains like healthcare, transport, and finance. An observable distinct feature of data quality in these contexts is the increasingly important role played by humans, being often the source of data generation and the active players in data curation. This workshop will provide an opportunity to explore the interdisciplinary overlap between manual, automated, and hybrid human-machine methods of data curation.

Files

3511808.3557498.pdf

(pdf | 0.891 Mb)

License info not available