Semantic enrichment and exploration process on domain specific digital libraries
More Info
expand_more
Abstract
With the growing number of scientific publications, the concept of navigating effectively and searching for domain specific information is rather significant and highly important for the scientific community [2]. For instance, to search by topics, research methods, used datasets, or scientific objectives. Such deep meta-data increase our perception for a given domain (i.e. Data Processing Pipelines) and facilitate us to understand and visualize the evolution of research topics and venues over time. Nevertheless, the extraction of such deep meta-data from text-based documents is notorious challenging and demanding due to the unstructured and ambiguous language of the text in different publications. The work in this paper has already contributed into two publication attempts; with one published paper at ESWC conference, and one accepted paper at the TPDL conference. Furthermore, the work in this project extends the analysis of previous attempts by adding more data from the domain of Data Processing Pipelines and by including one additional domain for analysis (i.e. the domain of Robotics). Moreover, this work provides justifications for all the implementation decisions and proposes a refined version of the online domain-aware semantic enrichment framework (SmartPub), that automates the generation of deep meta-data by utilizing key facets from the domains of Data Processing Pipelines and Robotics. The goal is to generate structured meta-data (i.e. named entities or phrases), from full-text scientific publications, with respect to a set of domain aware facets (i.e. Objective, Methods, Dataset, Software, and Results), and afterwards, to construct groups of facet-terms (i.e. facet-topics) according to their semantic similarity for allowing data exploration and navigation. Finally, the proposed framework is evaluated both quantitatively and qualitatively on seventeen conference series from the domains of Data Processing Pipelines and Robotics.