Workload Characterization and Modeling, and the Design and Evaluation of Cache Policies for Big Data Storage Workloads in the Cloud

Master Thesis (2018)
Author(s)

Sacheendra Talluri (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Alex Iosup – Mentor

Jan Rellermeyer – Coach

Fernando A. Kuipers – Coach

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2018 Sacheendra Talluri
More Info
expand_more
Publication Year
2018
Language
English
Copyright
© 2018 Sacheendra Talluri
Graduation Date
07-12-2018
Awarding Institution
Delft University of Technology
Programme
['Computer Science | Software Technology']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The proliferation of big-data processing platforms has already led to radically different system designs, such as MapReduce and the newer Spark. Understanding the workloads of such systems enables tuning and could foster new designs. However, whereas MapReduce workloads have been characterized extensively, relatively little public knowledge exists about the characteristics of Spark workloads in representative environments. In this work, we focus on understanding the behavior and cache performance of the storage sub-system used for Spark workloads in the cloud. First, we statistically characterize its usage. Second, we design a generative model to tackle the scarcity of workload traces. Third, we design a cache policy putting our insight from the characterization to work. Finally, we evaluate the performance of different cache policies for big data workloads via simulation.

Files

Report.pdf
(pdf | 13.6 Mb)
License info not available