Improving Response Time and Fairness with Efficient Scheduling and Partitioning

in Multi-User Apache Spark Environments

Master Thesis (2025)
Author(s)

D. Kažemaks (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Jérémie Decouchant – Mentor (TU Delft - Data-Intensive Systems)

Burcu Kulahcioglu Ozkan – Mentor (TU Delft - Software Engineering)

Laurens Versluis – Mentor (ASML)

C. Lofi – Graduation committee member (TU Delft - Web Information Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
08-07-2025
Awarding Institution
Delft University of Technology
Programme
['Electrical Engineering | Embedded Systems']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Apache Spark is a popular batch processing framework that is integrated into the ASML data analytics platform. However, having multiple users share the same resources creates fairness and efficiency problems in the scheduler, where some users may receive more resources than others, or the scheduler slows down all jobs equally. Apache Spark provides a built-in fair scheduler that attempts to mitigate these issues, but cannot account for changing user environments and has been shown to slow down overall job execution. Previous literature has already proposed improvements to the Spark fair scheduler, however, it does not account for user contexts to ensure fair resource distribution among users, and does not account for task skews that are present in the Spark ecosystem.


In this work, we present User Weighted Fair Queuing (UWFQ) scheduler that can minimize the response time of jobs in Apache Spark, while ensuring that users are allocated a fair share of resources that they equally distribute among their jobs. This is done by simulating a virtual fair scheduler, and assigning the highest priority to jobs that complete the earliest in the virtual scheduler. The algorithm is an extension to Cluster Fair Queuing designed by C. Chen et al., with an added fairness layer to ensure that each user is entitled to a fair share of resources. To address the problem of task skew present in Spark, we introduce runtime partitioning that can more effectively avoid task skew by splitting them into more partitions. We implement our scheduler in the Apache Spark framework and show that UWFQ with runtime partitioning can decrease the average response time of small jobs by up to 70% in certain workloads, while still providing fair resource allocation in critical scenarios.

Files

Report.pdf
(pdf | 82.1 Mb)
- Embargo expired in 01-08-2025
License info not available