Searched for: +
(1 - 3 of 3)
document
Hai, R. (author), Koutras, C. (author), Ionescu, A. (author), Li, Z. (author), Sun, W. (author), van Schijndel, Jessie (author), Kang, Yan (author), Katsifodimos, A (author)
Machine learning (ML) training data is often scattered across disparate collections of datasets, called data silos. This fragmentation poses a major challenge for data-intensive ML applications: integrating and transforming data residing in different sources demand a lot of manual work and computational resources. With data privacy and...
conference paper 2023
document
Hai, R. (author), Koutras, C. (author), Quix, Christoph (author), Jarke, Matthias (author)
Data lakes are becoming increasingly prevalent for Big Data management and data analytics. In contrast to traditional 'schema-on-write' approaches such as data warehouses, data lakes are repositories storing raw data in its original formats and providing a common access interface. Despite the strong interest raised from both academia and...
journal article 2023
document
Hai, R. (author), Koutras, C. (author), Ionescu, A. (author), Katsifodimos, A (author)
Data science workflows often require extracting, preparing and integrating data from multiple data sources. This is a cumbersome and slow process: most of the times, data scientists prepare data in a data processing system or a data lake, and export it as a table, in order for it to be consumed by a Machine Learning (ML) algorithm. Recent...
abstract 2022