Amalur

None, None; None, None; None, None; None, None; None, None; None, None; None, None

Amalur

The Convergence of Data Integration and Machine Learning

Journal Article (2024)

Author(s)

Ziyu Li (TU Delft - Web Information Systems)

Wenbo Sun (TU Delft - Web Information Systems)

D. Zhan (TU Delft - Web Information Systems)

Yan Kang (WeBank)

Lydia Chen (University of Neuchâtel, TU Delft - Data-Intensive Systems)

A Bozzon (TU Delft - Human-Centred Artificial Intelligence)

R. Hai (TU Delft - Web Information Systems)

Research Group

Web Information Systems

DOI related publication

https://doi.org/10.1109/TKDE.2024.3357389

Machine learning Federated learning Data integration Federated learning Data integration Data privacy Training Metadata Training data Soft sensors

To reference this document use:

https://resolver.tudelft.nl/uuid:88dde3a2-cdc8-4384-b845-501045c6def4

More Info

expand_more

Publication Year

2024

Language

English

Research Group

Web Information Systems

Issue number

12

Volume number

36

Pages (from-to)

7353-7367

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Machine learning (ML) training data is often scattered across disparate collections of datasets, called <italic>data silos</italic>. This fragmentation poses a major challenge for data-intensive ML applications: integrating and transforming data residing in different sources demand a lot of manual work and computational resources. With data privacy constraints, data often cannot leave the premises of data silos; hence model training should proceed in a decentralized manner. In this work, we present a vision of bridging traditional data integration (DI) techniques with the requirements of modern machine learning systems. We explore the possibilities of utilizing metadata obtained from data integration processes for improving the effectiveness, efficiency, and privacy of ML models. Towards this direction, we analyze ML training and inference over data silos. Bringing data integration and machine learning together, we highlight new research opportunities from the aspects of systems, representations, factorized learning, and federated learning.

Files

Amalur_The_Convergence_of_Data... (pdf)

(pdf | 2.11 Mb)