Answer Quality Aware Aggregation for Extractive QA Crowdsourcing

None, None; None, None; None, None; None, None; None, None

Answer Quality Aware Aggregation for Extractive QA Crowdsourcing

Conference Paper (2022)

Author(s)

P. Zhu (TU Delft - Web Information Systems)

Z. Wang (TU Delft - Mathematical Physics)

J. Yang (TU Delft - Web Information Systems)

C. Hauff (TU Delft - Web Information Systems)

A. Anand (TU Delft - Web Information Systems)

Research Group

Web Information Systems

DOI related publication

https://doi.org/10.18653/v1/2022.findings-emnlp.457

To reference this document use:

https://resolver.tudelft.nl/uuid:afd99755-5ee1-40b0-abe8-fd88861b8d82

More Info

expand_more

Publication Year

2022

Language

English

Research Group

Web Information Systems

Pages (from-to)

6147-6159

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Quality control is essential for creating extractive question answering (EQA) datasets via crowdsourcing. Aggregation across answers, i.e. word spans within passages annotated, by different crowd workers is one major focus for ensuring its quality. However, crowd workers cannot reach a consensus on a considerable portion of questions. We introduce a simple yet effective answer aggregation method that takes into account the relations among the answer, question, and context passage. We evaluate answer quality from both the view of question answering model to determine how confident the QA model is about each answer and the view of the answer verification model to determine whether the answer is correct. Then we compute aggregation scores with each answer’s quality and its contextual embedding produced by pre-trained language models. The experiments on a large real crowdsourced EQA dataset show that our framework outperforms baselines by around 16% on precision and effectively conduct answer aggregation for extractive QA task.

Files

2022.findings_emnlp.457.pdf

(pdf | 0.773 Mb)