Coner

None, None; None, None; None, None; None, None; None, None

Coner

A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications

Conference Paper (2019)

Author(s)

Daniel Vliegenthart (Student TU Delft, National Institute of Informatics)

S. Mesbah (TU Delft - Web Information Systems)

C. Lofi (TU Delft - Web Information Systems)

Akiko Aizawa (National Institute of Informatics)

A. Bozzon (TU Delft - Web Information Systems)

Research Group

Web Information Systems

Copyright

DOI related publication

https://doi.org/10.1007/978-3-030-30760-8_1

To reference this document use:

https://resolver.tudelft.nl/uuid:e3d634e4-6ba9-4ab4-b80a-ffaefc527091

More Info

expand_more

Publication Year

2019

Language

English

Copyright

Research Group

Web Information Systems

Pages (from-to)

3-17

ISBN (print)

978-3-030-30759-2

ISBN (electronic)

978-3-030-30760-8

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Named Entity Recognition (NER) for rare long-tail entities as e.g., often found in domain-specific scientific publications is a challenging task, as typically the extensive training data and test data for fine-tuning NER algorithms is lacking. Recent approaches presented promising solutions relying on training NER algorithms in an iterative weakly-supervised fashion, thus limiting human interaction to only providing a small set of seed terms. Such approaches heavily rely on heuristics in order to cope with the limited training data size. As these heuristics are prone to failure, the overall achievable performance is limited. In this paper, we therefore introduce a collaborative approach which incrementally incorporates human feedback on the relevance of extracted entities into the training cycle of such iterative NER algorithms. This approach, called Coner, allows to still train new domain specific rare long-tail NER extractors with low costs, but with ever increasing performance while the algorithm is actively used in an application.

Files

2019TPDL_Coner.pdf

(pdf | 0.455 Mb)

License info not available