Tyrex

None, None; None, None

Tyrex

Size-Based Resource Allocation in MapReduce Frameworks

Conference Paper (2016)

Author(s)

B.I. Ghit (TU Delft - Data-Intensive Systems)

D.H.J. Epema (Eindhoven University of Technology, TU Delft - Data-Intensive Systems)

Research Group

Data-Intensive Systems

Copyright

DOI related publication

https://doi.org/10.1109/CCGrid.2016.82

Data Analysis Resource management Delays Computational modeling Time factors Servers Runtime

To reference this document use:

https://resolver.tudelft.nl/uuid:f521ffae-b74c-42db-bcf0-355e3eff0f6f

More Info

expand_more

Publication Year

2016

Language

English

Copyright

Research Group

Data-Intensive Systems

Pages (from-to)

11-20

ISBN (electronic)

978-1-5090-2453-7

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Many large-scale data analytics infrastructures are employed for a wide variety of jobs, ranging from short interactive queries to large data analysis jobs that may take hours or even days to complete. As a consequence, data-processing frameworks like MapReduce may have workloads consisting of jobs with heavy-tailed processing requirements. With such workloads, short jobs may experience slowdowns that are an order of magnitude larger than large jobs do, while the users may expect slowdowns that are more in proportion with the job sizes. To address this problem of large job slowdown variability in MapReduce frameworks, we design a scheduling system called TYREX that is inspired by the well-known TAGS task assignment policy in distributed-server systems. In particular, TYREX partitions the resources of a MapReduce framework, allowing any job running in any partition to read data stored on any machine, imposes runtime limits in the partitions, and successively executes parts of jobs in a work-conserving way in these partitions until they can run to completion. We develop a statistical model for dynamically setting the runtime limits that achieves near optimal job slowdown performance, and we empirically evaluate TYREX on a cluster system with workloads consisting of both synthetic and real-world benchmarks. We find that TYREX cuts in half the job slowdown variability while preserving the median job slowdown when compared to state-of-the-art MapReduce schedulers such as FIFO and FAIR. Furthermore, TYREX reduces the job slowdown at the 95th percentile by more than 50% when compared to FIFO and by 20-40% when compared to FAIR.

Files

CCGRID_2016_paper_82.pdf

(pdf | 0.622 Mb)

License info not available