Muses

Distributed data migration system for polystores

Conference Paper (2019)
Author(s)

Abdulrahman Kaitoua (DFKI GmbH)

Tilmann Rabl (DFKI GmbH)

Asterios Katsifodimos (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Volker Markl (DFKI GmbH)

Research Group
Web Information Systems
DOI related publication
https://doi.org/10.1109/ICDE.2019.00152 Final published version
More Info
expand_more
Publication Year
2019
Language
English
Research Group
Web Information Systems
Article number
8731469
Pages (from-to)
1602-1605
ISBN (print)
978-1-5386-7475-8
ISBN (electronic)
978-1-5386-7474-1
Event
35th IEEE International Conference on Data Engineering, ICDE 2019 (2019-04-08 - 2019-04-11), Macau, China
Downloads counter
154

Abstract

Large datasets can originate from various sources and are being stored in heterogeneous formats, schemas, and locations. Typical data science tasks need to combine those datasets in order to increase their value and extract knowledge. This is done in various data processing systems with diverse execution engines. In order to take advantage of each execution engine's characteristics and APIs data scientists need to migrate and transform their datasets at a very high computational cost and manual labor. Data migration is challenging for two main reasons: i) execution engines expect specific types/shapes of the data as input; ii) there are various physical representations of the data (e.g., partitions). Therefore, migrating data efficiently requires knowledge of systems internals and assumptions. In this paper we present Muses, a distributed, high-performance data migration engine that is able to forward, transform, repartition, and broadcast data between distributed engines' instances efficiently. Muses does not require any changes in the underlying execution engines. In an experimental evaluation, we show that migrating data from one execution engine to another (in order to take advantage of faster, native operations) can increase a pipeline's performance by 30%.