Coner

A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications

Conference Paper (2019)
Author(s)

Daniel Vliegenthart (Student TU Delft, National Institute of Informatics)

S. Mesbah (TU Delft - Web Information Systems)

C. Lofi (TU Delft - Web Information Systems)

Akiko Aizawa (National Institute of Informatics)

A. Bozzon (TU Delft - Web Information Systems)

Research Group
Web Information Systems
Copyright
© 2019 Daniel Vliegenthart, S. Mesbah, C. Lofi, Akiko Aizawa, A. Bozzon
DOI related publication
https://doi.org/10.1007/978-3-030-30760-8_1
More Info
expand_more
Publication Year
2019
Language
English
Copyright
© 2019 Daniel Vliegenthart, S. Mesbah, C. Lofi, Akiko Aizawa, A. Bozzon
Research Group
Web Information Systems
Pages (from-to)
3-17
ISBN (print)
978-3-030-30759-2
ISBN (electronic)
978-3-030-30760-8
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Named Entity Recognition (NER) for rare long-tail entities as e.g., often found in domain-specific scientific publications is a challenging task, as typically the extensive training data and test data for fine-tuning NER algorithms is lacking. Recent approaches presented promising solutions relying on training NER algorithms in an iterative weakly-supervised fashion, thus limiting human interaction to only providing a small set of seed terms. Such approaches heavily rely on heuristics in order to cope with the limited training data size. As these heuristics are prone to failure, the overall achievable performance is limited. In this paper, we therefore introduce a collaborative approach which incrementally incorporates human feedback on the relevance of extracted entities into the training cycle of such iterative NER algorithms. This approach, called Coner, allows to still train new domain specific rare long-tail NER extractors with low costs, but with ever increasing performance while the algorithm is actively used in an application.

Files

2019TPDL_Coner.pdf
(pdf | 0.455 Mb)
License info not available