- document
-
Hai, R. (author), Koutras, C. (author), Ionescu, A. (author), Li, Z. (author), Sun, W. (author), van Schijndel, Jessie (author), Kang, Yan (author), Katsifodimos, A (author)Machine learning (ML) training data is often scattered across disparate collections of datasets, called data silos. This fragmentation poses a major challenge for data-intensive ML applications: integrating and transforming data residing in different sources demand a lot of manual work and computational resources. With data privacy and...conference paper 2023
- document
-
Hai, R. (author), Koutras, C. (author), Quix, Christoph (author), Jarke, Matthias (author)Data lakes are becoming increasingly prevalent for Big Data management and data analytics. In contrast to traditional 'schema-on-write' approaches such as data warehouses, data lakes are repositories storing raw data in its original formats and providing a common access interface. Despite the strong interest raised from both academia and...journal article 2023
- document
-
Hai, R. (author), Koutras, C. (author), Ionescu, A. (author), Katsifodimos, A (author)Data science workflows often require extracting, preparing and integrating data from multiple data sources. This is a cumbersome and slow process: most of the times, data scientists prepare data in a data processing system or a data lake, and export it as a table, in order for it to be consumed by a Machine Learning (ML) algorithm. Recent...abstract 2022