Workload Characterization and Modeling, and the Design and Evaluation of Cache Policies for Big Data Storage Workloads in the Cloud

None, None

Workload Characterization and Modeling, and the Design and Evaluation of Cache Policies for Big Data Storage Workloads in the Cloud

Master Thesis (2018)

Author(s)

Sacheendra Talluri (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Alex Iosup – Mentor

Jan Rellermeyer – Coach

Fernando A. Kuipers – Coach

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Performance Evaluation Cloud Storage Modeling Big Data Characterization Cache Policies

To reference this document use:

https://resolver.tudelft.nl/uuid:29f066b2-1e7c-4ab4-8ba8-3516032a8237

More Info

expand_more

Publication Year

2018

Language

English

Copyright

Graduation Date

07-12-2018

Awarding Institution

Delft University of Technology

Programme

['Computer Science | Software Technology']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The proliferation of big-data processing platforms has already led to radically different system designs, such as MapReduce and the newer Spark. Understanding the workloads of such systems enables tuning and could foster new designs. However, whereas MapReduce workloads have been characterized extensively, relatively little public knowledge exists about the characteristics of Spark workloads in representative environments. In this work, we focus on understanding the behavior and cache performance of the storage sub-system used for Spark workloads in the cloud. First, we statistically characterize its usage. Second, we design a generative model to tackle the scarcity of workload traces. Third, we design a cache policy putting our insight from the characterization to work. Finally, we evaluate the performance of different cache policies for big data workloads via simulation.

Files

Report.pdf

(pdf | 13.6 Mb)

License info not available