Title
Amalur: Data Integration Meets Machine Learning
Author
Hai, R. (TU Delft Web Information Systems)
Koutras, C. (TU Delft Web Information Systems)
Ionescu, A. (TU Delft Web Information Systems)
Li, Z. (TU Delft Web Information Systems)
Sun, W. (TU Delft Web Information Systems)
van Schijndel, Jessie (Student TU Delft)
Kang, Yan (WeBank)
Katsifodimos, A (TU Delft Web Information Systems)
Date
2023
Abstract
Machine learning (ML) training data is often scattered across disparate collections of datasets, called data silos. This fragmentation poses a major challenge for data-intensive ML applications: integrating and transforming data residing in different sources demand a lot of manual work and computational resources. With data privacy and security constraints, data often cannot leave the premises of data silos, hence model training should proceed in a decentralized manner. In this work, we present a vision of how to bridge the traditional data integration (DI) techniques with the requirements of modern machine learning. We explore the possibilities of utilizing metadata obtained from data integration processes for improving the effectiveness and efficiency of ML models. Towards this direction, we analyze two common use cases over data silos, feature augmentation and federated learning. Bringing data integration and machine learning together, we highlight new research opportunities from the aspects of systems, representations, factorized learning and federated learning.
To reference this document use:
http://resolver.tudelft.nl/uuid:1bf43af1-cdc8-4d26-8449-1aaf77db8a4f
DOI
https://doi.org/10.1109/ICDE55515.2023.00301
Publisher
IEEE, Piscataway
Embargo date
2024-01-26
ISBN
979-8-3503-2228-6
Source
Proceedings of the 2023 IEEE 39th International Conference on Data Engineering, ICDE 2023
Event
39th IEEE International Conference on Data Engineering, ICDE 2023, 2023-04-03 → 2023-04-07, Anaheim, United States
Series
Proceedings - International Conference on Data Engineering, 1084-4627, 2023-April
Bibliographical note
Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.
Part of collection
Institutional Repository
Document type
conference paper
Rights
© 2023 R. Hai, C. Koutras, A. Ionescu, Z. Li, W. Sun, Jessie van Schijndel, Yan Kang, A Katsifodimos