Semantic Annotation of Data Processing Pipelines in Scientific Publications

None, None; None, None; None, None; None, None; None, None

Semantic Annotation of Data Processing Pipelines in Scientific Publications

Conference Paper (2017)

Author(s)

Sepideh Mesbah (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Kyriakos Fragkeskos (External organisation)

Christoph Lofi (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Alessandro Bozzon (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Geert-Jan Houben (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group

Web Information Systems

DOI related publication

https://doi.org/10.1007/978-3-319-58068-5_20 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:0f278790-e7f4-469c-911a-541f63ff4e01

More Info

expand_more

Publication Year

2017

Language

English

Research Group

Web Information Systems

Pages (from-to)

321-336

Publisher

Springer

ISBN (print)

978-3-319-58067-8

ISBN (electronic)

978-3-319-58068-5

Event

Extended Semantic Web Conference (2017-05-28 - 2017-06-01), Portorož, Slovenia

Downloads counter

181

Abstract

Data processing pipelines are a core object of interest for data scientist and practitioners operating in a variety of data-related application domains. To effectively capitalise on the experience gained in the creation and adoption of such pipelines, the need arises for mechanisms able to capture knowledge about datasets of interest, data processing methods designed to achieve a given goal, and the performance achieved when applying such methods to the considered datasets. However, due to its distributed and often unstructured nature, this knowledge is not easily accessible. In this paper, we use (scientific) publications as source of knowledge about Data Processing Pipelines. We describe a method designed to classify sentences according to the nature of the contained information (i.e. scientific objective, dataset, method, software, result), and to extract relevant named entities. The extracted information is then semantically annotated and published as linked data in open knowledge repositories according to the DMS ontology for data processing metadata. To demonstrate the effectiveness and performance of our approach, we present the results of a quantitative and qualitative analysis performed on four different conference series.