Amalur

None, None; None, None; None, None; None, None; None, None; None, None; None, None; None, None

Amalur

Data Integration Meets Machine Learning

Conference Paper (2023)

Author(s)

Rihan Hai (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Christos Koutras (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Andra Ionescu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Ziyu Li (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Wenbo Sun (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Jessie van Schijndel (Student TU Delft)

Yan Kang (WeBank)

Asterios Katsifodimos (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group

Web Information Systems

DOI related publication

https://doi.org/10.1109/ICDE55515.2023.00301 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:1bf43af1-cdc8-4d26-8449-1aaf77db8a4f

More Info

expand_more

Publication Year

2023

Language

English

Research Group

Web Information Systems

Pages (from-to)

3729-3739

ISBN (print)

979-8-3503-2228-6

ISBN (electronic)

979-8-3503-2227-9

Event

39th IEEE International Conference on Data Engineering, ICDE 2023 (2023-04-03 - 2023-04-07), Anaheim, United States

Downloads counter

279

Collections

Institutional Repository

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Machine learning (ML) training data is often scattered across disparate collections of datasets, called data silos. This fragmentation poses a major challenge for data-intensive ML applications: integrating and transforming data residing in different sources demand a lot of manual work and computational resources. With data privacy and security constraints, data often cannot leave the premises of data silos, hence model training should proceed in a decentralized manner. In this work, we present a vision of how to bridge the traditional data integration (DI) techniques with the requirements of modern machine learning. We explore the possibilities of utilizing metadata obtained from data integration processes for improving the effectiveness and efficiency of ML models. Towards this direction, we analyze two common use cases over data silos, feature augmentation and federated learning. Bringing data integration and machine learning together, we highlight new research opportunities from the aspects of systems, representations, factorized learning and federated learning.

Files

Amalur_Data_Integration_Meets_... (pdf)

(pdf | 1.57 Mb)

- Embargo expired in 26-01-2024

License info not available