Amalur

None, None; None, None; None, None; None, None

Amalur

Next-generation Data Integration in Data Lakes

Abstract (2022)

Author(s)

Rihan Hai (TU Delft - Web Information Systems)

Christos Koutras (TU Delft - Web Information Systems)

Andra Ionescu (TU Delft - Web Information Systems)

Asterios Katsifodimos (TU Delft - Web Information Systems)

Research Group

Web Information Systems

To reference this document use:

https://resolver.tudelft.nl/uuid:db61d32d-4167-4ed2-9527-321f75d18281

More Info

expand_more

Publication Year

2022

Language

English

Research Group

Web Information Systems

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Data science workflows often require extracting, preparing and integrating data from multiple data sources. This is a cumbersome and slow process: most of the times, data scientists prepare data in a data processing system or a data lake, and export it as a table, in order for it to be consumed by a Machine Learning (ML) algorithm. Recent advances in the area of factorized ML, allow us to push down certain linear algebra (LA) operators, executing them closer to the data sources. With this work, we revisit classic data integration (DI) systems and see how these fit into modern data lakes that are meant to support LA as a first-class citizen.

Files

A85_hai.pdf

(pdf | 0.404 Mb)