Active learning from crowd in document screening

None, None; None, None; None, None; None, None

Active learning from crowd in document screening

Journal Article (2020)

Author(s)

Evgeny Krivosheev (Università degli Studi di Trento)

Burcu Sayin (Università degli Studi di Trento)

Alessandro Bozzon (TU Delft - Industrial Design Engineering)

Zoltán Szlávik (myTomorrows)

Research Group

Human-Centred Artificial Intelligence

To reference this document use

https://resolver.tudelft.nl/uuid:bbc00d0a-fe75-4475-83d9-ffb177a325b7

More Info

expand_more

Publication Year

2020

Language

English

Research Group

Human-Centred Artificial Intelligence

Journal title

CEUR Workshop Proceedings

Volume number

2736

Pages (from-to)

19-25

Event

2020 Crowd Science Workshop: Remoteness, Fairness, and Mechanisms as Challenges of Data Supply by Humans for Automation (2020-12-11 - 2020-12-11), Vancouver, Canada

Downloads counter

208

Collections

Institutional Repository

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In this paper, we explore how to efficiently combine crowdsourcing and machine intelligence for the problem of document screening, where we need to screen documents with a set of machine-learning filters. Specifically, we focus on building a set of machine learning classifiers that evaluate documents, and then screen them efficiently. It is a challenging task since the budget is limited and there are countless number of ways to spend the given budget on the problem. We propose a multi-label active learning screening specific sampling technique -objective-aware samplingfor querying unlabelled documents for annotating. Our algorithm takes a decision on which machine filter need more training data and how to choose unlabeled items to annotate in order to minimize the risk of overall classification errors rather than minimizing a single filter error. We demonstrate that objective-aware sampling significantly outperforms the state of the art active learning sampling strategies.

Files

Paper4.pdf

(pdf | 0.602 Mb)