An experimental evaluation of auto-scaling techniques for distributed stream processing systems

Master Thesis (2023)
Authors

J.B. Kanis (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Supervisors

George Siachamis (TU Delft - Web Information Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Job Kanis
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Job Kanis
Graduation Date
28-03-2023
Awarding Institution
Delft University of Technology
Programme
Computer Science
Faculty
Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The introduction of cloud hosting has made it possible to elastically provision distributed stream processing systems (SPEs). By dynamically scaling the different operators of the system, resource consumption can be minimised while meeting the system service-level objectives. In the literature, many different auto-scaling techniques are proposed that make scaling decisions based on the current state of the system. However, these techniques are poorly evaluated and are rarely compared with each other. This makes it difficult to determine the state-of-the-art for auto-scaling techniques targeting SPEs, which slows down its development. In this paper, we design and implement a modular framework to evaluate the performance of state-of-the-art auto-scalers targeting SPEs. We implement state-of-the-art auto-scalers Dhalion, DS2, and Varga et al., using Kubernetes horizontal pod auto-scaler as baseline. We perform an end-to-end experimental evaluation of the auto-scalers and investigate their performance when run on different queries and workload patterns. Furthermore, we investigate the convergence time of the auto-scalers and evaluate their scaling accuracy. The results emphasise the difficulty of capturing the complex relationships of different operators and the struggle to balance resource efficiency and the performance of the system. Moreover, it shows the inherent weakness of reactive auto-scalers to react slowly to changing workloads and reveals the importance of considering the current health of the system when issuing scaling actions.

Files

License info not available