SmartPub

A Platform for Long-Tail Entity Extraction from Scientific Publications

Conference Paper (2018)
Author(s)

S. Mesbah (TU Delft - Web Information Systems)

A. Bozzon (TU Delft - Web Information Systems)

Christoph Lofi (TU Delft - Web Information Systems)

G.J. Houben (TU Delft - Web Information Systems)

Research Group
Web Information Systems
Copyright
© 2018 S. Mesbah, A. Bozzon, C. Lofi, G.J.P.M. Houben
DOI related publication
https://doi.org/10.1145/3184558.3186976
More Info
expand_more
Publication Year
2018
Language
English
Copyright
© 2018 S. Mesbah, A. Bozzon, C. Lofi, G.J.P.M. Houben
Research Group
Web Information Systems
Pages (from-to)
191-194
ISBN (electronic)
978-1-4503-5640-4
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This demo presents SmartPub, a novel web-based platform that supports the exploration and visualization of shallow meta-data (e.g., author list, keywords) and deep meta-data--long tail named entities which are rare, and often relevant only in specific knowledge domain--from scientific publications. The platform collects documents from different sources (e.g. DBLP and Arxiv), and extracts the domain-specific named entities from the text of the publications using Named Entity Recognizers (NERs) which we can train with minimal human supervision even for rare entity types. The platform further enables the interaction with the Crowd for filtering purposes or training data generation, and provides extended visualization and exploration capabilities. SmartPub will be demonstrated using sample collection of scientific publications focusing on the computer science domain and will address the entity types Dataset (i.e. dataset presented or used in a publication), and Methods (i.e. algorithms used to create/enrich/analyse a data set)