Reproducing state-of-the-art schema matching algorithms

Master Thesis (2020)
Author(s)

A.D. Ionescu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

C. Lofi – Mentor (TU Delft - Web Information Systems)

G.J. Houben – Graduation committee member (TU Delft - Web Information Systems)

Asterios Katsifodimos – Graduation committee member (TU Delft - Web Information Systems)

Arie van Deursen – Graduation committee member (TU Delft - Software Technology)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2020 Andra Ionescu
More Info
expand_more
Publication Year
2020
Language
English
Copyright
© 2020 Andra Ionescu
Graduation Date
11-02-2020
Awarding Institution
Delft University of Technology
Programme
['Computer Science | Software Technology']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Schema matching has been a researched topic for over 20 years. Therefore, many schema matching solutions have been proposed to treat various problems such as: creating unified knowledge bases or mediation schema, data translation, data discovery, data curation. Such a wide variety of schema matching algorithms requires a benchmarking system that can evaluate to what extent one solution is appropriate for a given problem. However, creating the benchmark requires open source algorithms, which are not widely available in the data management community. One solution to this problem is reproducing the algorithms, although there is a reproducibility crisis which proves that the majority of existing research can not be reproduced. These circumstances have determined the goal of this research: conducting a reproducibility study on the state-of-the-art schema matching algorithms. This study supports the schema matching development and emphasizes the issues regarding the ability to reproduce the algorithms or the results. Moreover, we implement the selected algorithms and benchmark them in an industry case study.

Files

Thesis_Andra_Ionescu.pdf
(pdf | 0.86 Mb)
License info not available