Online label aggregation

None, None; None, None; None, None; None, None; None, None

Online label aggregation

A variational bayesian approach

Conference Paper (2021)

Author(s)

C. Hong (TU Delft - Data-Intensive Systems)

S. Ghiassi (TU Delft - Data-Intensive Systems)

Yichi Zhou (Tsinghua University)

Robert Birke (ABB (Switzerland))

Lydia Y. Chen (TU Delft - Data-Intensive Systems)

Research Group

Data-Intensive Systems

Copyright

DOI related publication

https://doi.org/10.1145/3442381.3449933

Online Convergence bound Label aggregation Stochastic optimizer Variational bayesian inference

To reference this document use:

https://resolver.tudelft.nl/uuid:3e5ee25c-a945-471c-bb4f-f8ec459687de

More Info

expand_more

Publication Year

2021

Language

English

Copyright

Research Group

Data-Intensive Systems

Pages (from-to)

1904-1915

ISBN (electronic)

978-1-4503-8312-7

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Noisy labeled data is more a norm than a rarity for crowd sourced contents. It is effective to distill noise and infer correct labels through aggregating results from crowd workers. To ensure the time relevance and overcome slow responses of workers, online label aggregation is increasingly requested, calling for solutions that can incrementally infer true label distribution via subsets of data items. In this paper, we propose a novel online label aggregation framework, BiLA , which employs variational Bayesian inference method and designs a novel stochastic optimization scheme for incremental training. BiLA is flexible to accommodate any generating distribution of labels by the exact computation of its posterior distribution. We also derive the convergence bound of the proposed optimizer. We compare BiLA with the state of the art based on minimax entropy, neural networks and expectation maximization algorithms, on synthetic and real-world data sets. Our evaluation results on various online scenarios show that BiLA can effectively infer the true labels, with an error rate reduction of at least 10 to 1.5 percent points for synthetic and real-world datasets, respectively.

Files

3442381.3449933.pdf

(pdf | 1.64 Mb)