Is Big Data Performance Reproducible in Modern Cloud Networks?

Conference Paper (2020)
Author(s)

Alexandru Uta (Vrije Universiteit Amsterdam)

Alexandru Custura (Vrije Universiteit Amsterdam)

Dmitry Duplyakin (University of Utah)

Ivo Jimenez (UC Santa Cruz)

Jan S. Rellermeyer (TU Delft - Data-Intensive Systems)

Carlos Maltzahn (UC Santa Cruz)

Robert Ricci (University of Utah)

Alexandru Iosup (Vrije Universiteit Amsterdam)

Research Group
Data-Intensive Systems
Copyright
© 2020 Alexandru Uta, Alexandru Custura, Dmitry Duplyakin, Ivo Jimenez, Jan S. Rellermeyer, Carlos Maltzahn, Robert Ricci, Alexandru Iosup
More Info
expand_more
Publication Year
2020
Language
English
Copyright
© 2020 Alexandru Uta, Alexandru Custura, Dmitry Duplyakin, Ivo Jimenez, Jan S. Rellermeyer, Carlos Maltzahn, Robert Ricci, Alexandru Iosup
Research Group
Data-Intensive Systems
Pages (from-to)
513-527
ISBN (electronic)
978-1-939133-13-7
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Performance variability has been acknowledged as a problem for over a decade by cloud practitioners and performance engineers. Yet, our survey of top systems conferences reveals that the research community regularly disregards variability when running experiments in the cloud. Focusing on networks, we assess the impact of variability on cloud-based big-data workloads by gathering traces from mainstream commercial clouds and private research clouds. Our dataset consists of millions of datapoints gathered while transferring over 9 petabytes on cloud providers' networks. We characterize the network variability present in our data and show that, even though commercial cloud providers implement mechanisms for quality-of-service enforcement, variability still occurs, and is even exacerbated by such mechanisms and service provider policies. We show how big-data workloads suffer from significant slowdowns and lack predictability and replicability, even when state-of-the-art experimentation techniques are used. We provide guidelines to reduce the volatility of big data performance, making experiments more repeatable.

Files

Nsdi20_paper_uta.pdf
(pdf | 1.16 Mb)
License info not available