Achieving Performance Balance among Spark Frameworks with Two-Level Schedulers

None, None; None, None; None, None; None, None

Achieving Performance Balance among Spark Frameworks with Two-Level Schedulers

Conference Paper (2018)

Author(s)

Aleksandra Kuzmanovska (Eindhoven University of Technology)

Hans van den Bogert (Student TU Delft)

Rudolf Mak (Eindhoven University of Technology)

D.H.J. Epema (TU Delft - Data-Intensive Systems)

Research Group

Data-Intensive Systems

DOI related publication

https://doi.org/10.1109/CCGRID.2018.00028

Spark Data processing framework DRF Job slowdown Koala F Mesos Performance balance Resource allocation policy Two level schedulers

To reference this document use:

https://resolver.tudelft.nl/uuid:12c4c0d9-9eb1-4b82-9fd1-152d1bb9f43b

More Info

expand_more

Publication Year

2018

Language

English

Research Group

Data-Intensive Systems

Pages (from-to)

133-142

ISBN (electronic)

9781538658154

Abstract

When multiple data-processing frameworks with time-varying workloads are simultaneously present in a single cluster or data-center, an apparent goal is to have them experience equal performance, expressed in whatever performance metrics are applicable. In modern data-center environments, Two-Level Schedulers (TLSs) that leave the scheduling of individual jobs to the schedulers within the data-processing frameworks are typically used for managing the resources of data-processing frameworks. Two such TLSs with opposite designs are Mesos and Koala-F. Mesos employs fine-grained resource allocation and aims at Dominant Resource Fairness (DRF) among framework instances by offering resources to them for the duration of a single task. In contrast, Koala-F aims at performance fairness among framework instances by employing dynamic coarse-grained resource allocation of sets of complete nodes based on performance feedback from individual instances. The goal of this paper is to explore the trade-offs between these two TLS designs when trying to achieve performance balance among frameworks. We select Apache Spark as a representative of data-processing frameworks, and perform experiments on a modest-sized cluster, using jobs chosen from commonly used data-processing benchmarks. Our results reveal that achieving performance balance among framework instances is a challenge for both TLS designs, despite their opposite design choices. Moreover, we exhibit design flaws in the DRF allocation policy that prevent Mesos from achieving performance balance. Finally, to remedy these flaws, we propose a feedback controller for Mesos that dynamically adapts framework weights, as used in Weighted DRF (W-DRF), based on their performance.

No files available

Metadata only record. There are no files for this record.