Holistic Schema Matching at Scale

None, None

Holistic Schema Matching at Scale

Master Thesis (2020)

Author(s)

Kyriakos Psarakis (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Katsifodimos – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

G.J.P.M. Houben – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A. van Deursen – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Data Management Scalability Schema Matching

To reference this document use

https://resolver.tudelft.nl/uuid:f4ebeda3-6465-49da-813b-f1e6e0820c60

More Info

expand_more

Publication Year

2020

Language

English

Graduation Date

03-12-2020

Awarding Institution

Delft University of Technology

Programme

Computer Science, Software Technology

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

416

Collections

thesis

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Schema matching is a fundamental task in the data integration pipeline and has been studied extensively in the past decades, leading to many novel schema matching methods. However, these methods do not follow a standard evaluation process, leading to uncertainty in which one performs best in matching accuracy and runtime constraints, and in which specific schema matching category, and with what hyperparameters. To clear the confusion, the need for a scalable benchmarking suite to determine the field's progress became apparent, leading to the first contribution of this work, a scalable benchmarking suite for schema matching tasks. In the meantime, we realized that the literature lacked a scalable holistic schema matching system, leading to our second contribution. By considering the knowledge gained from our proposed benchmark, we developed a system that can incorporate any algorithm and data source while running the schema matching jobs in parallel across multiple machines in a scalable fashion. Furthermore, we decided to give a leading role to the users of such a system. The reason behind that is that it became apparent in the benchmark that no algorithm is perfect in every situation, and in mission-critical applications, we cannot afford any mistakes. Thus, the users would have to approve the proposed matches, and we focused on making this task scalable, fast, and straightforward.

Files

KyriakosPsarakisMasterThesis.p... (pdf)

(pdf | 6.13 Mb)

License info not available