A workload model for MapReduce

More Info
expand_more

Abstract

MapReduce is a parallel programming model used by Cloud service providers for data mining. To be able to enhance existing and to develop new MapReduce sys- tems, we need to evaluate the performance of these systems. To this end we intro- duce in this work the Cloud Workloads Archive Toolbox. This toolbox facilitates the analysis of MapReduce workload traces, generation of realistic synthetic work- loads, and the evaluation of MapReduce systems in simulation. We present an overview and analysis of real world MapReduce workload traces, we propose a model for MapReduce workloads, we describe the development of the toolbox, and we present an experiment in which we use our toolbox to evaluate two MapReduce schedulers.

Files