Validation methodology for expert-annotated datasets

None, None; None, None

Validation methodology for expert-annotated datasets

Event annotation case study

Conference Paper (2019)

Author(s)

Oana Inel (Vrije Universiteit Amsterdam, TU Delft - Web Information Systems)

Lora Aroyo (Google LLC)

Research Group

Web Information Systems

Copyright

DOI related publication

https://doi.org/10.4230/OASIcs.LDK.2019.12

Crowdsourcing Human-in-the-loop Event extraction Time extraction

To reference this document use:

https://resolver.tudelft.nl/uuid:4ede1172-ac7a-415d-bc66-d1ec8fe3bd19

More Info

expand_more

Publication Year

2019

Language

English

Copyright

Research Group

Web Information Systems

Volume number

70

Pages (from-to)

1-15

ISBN (electronic)

9783959771054

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Event detection is still a difficult task due to the complexity and the ambiguity of such entities. On the one hand, we observe a low inter-annotator agreement among experts when annotating events, disregarding the multitude of existing annotation guidelines and their numerous revisions. On the other hand, event extraction systems have a lower measured performance in terms of F1-score compared to other types of entities such as people or locations. In this paper we study the consistency and completeness of expert-annotated datasets for events and time expressions. We propose a data-agnostic validation methodology of such datasets in terms of consistency and completeness. Furthermore, we combine the power of crowds and machines to correct and extend expert-annotated datasets of events. We show the benefit of using crowd-annotated events to train and evaluate a state-of-the-art event extraction system. Our results show that the crowd-annotated events increase the performance of the system by at least 5.3%.

Files

OASIcs_LDK_2019_12.pdf

(pdf | 0.736 Mb)