An exploratory journey to combine schema matchers for better relevance prediction

Master Thesis (2022)
Author(s)

W.H. Wang (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Katsifodimos – Mentor (TU Delft - Web Information Systems)

Geert-Jan Houben – Graduation committee member (TU Delft - Web Information Systems)

Lydia Chen – Graduation committee member

Andra Ionescu – Mentor (TU Delft - Web Information Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2022 Wang Hao Wang
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Wang Hao Wang
Graduation Date
01-12-2022
Awarding Institution
Delft University of Technology
Programme
['Computer Science | Software Technology']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Current speed of data growth has exponentially increased over the past decade, highlighting the need of modern organizations for data discovery systems. Several (automated) schema matching approaches have been proposed to find related data, exploiting different parts of schema information (e.g. data type, data distribution, column name, etc.). However, research showed that single schema matching techniques fails to effectively match schemas, whilst combinatorial schema matching systems show more promise. With the introduction of combinatorial schema matching systems, new challenges arise regarding selection and combining strategies. This research attempts to explore different techniques for determining the importance of each matcher in a combinatorial schema matching system by determining the weights of each matcher and comparing them through a comprehensive evaluation.

Files

Thesis_Wang_Hao_Wang.pdf
(pdf | 3.82 Mb)
License info not available