Generating labeled datasets for schema matching

Master Thesis (2023)
Author(s)

K. Chronas (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Katsifodimos – Mentor (TU Delft - Web Information Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2023
Language
English
Graduation Date
22-03-2023
Awarding Institution
Delft University of Technology
Programme
['Computer Science | Software Technology']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Matching schemas is a fundamental task in data integration and semantic web applications. However, generating labeled data for schema matching tasks is challenging, requiring an efficient and effective approach. This thesis addresses this challenge by investigating schema matching techniques and crowdsourcing solutions. We developed a prototype crowdsourcing platform for schema matching called Crowdie. The platform utilizes a novel pre-filtering algorithm to reduce the number of possible correspondences and improve the platform’s efficiency while minimizing the cost of crowdsourcing.
Additionally, we designed a simple yet effective task interface to ensure high-quality labeled data. Our findings demonstrate that crowdsourcing is viable for generating labeled data for schema matching tasks. Overall, this work contributes to reducing search spaces and developing crowdsourcing solutions for schema matching tasks.

Files

License info not available