Benchmarking checkpointing algorithms in Stream Processing Engines

Master Thesis (2023)
Author(s)

G. Wiemers (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A Katsifodimos – Mentor (TU Delft - Web Information Systems)

George Siachamis – Mentor (TU Delft - Web Information Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Gianni Wiemers
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Gianni Wiemers
Graduation Date
24-08-2023
Awarding Institution
Delft University of Technology
Programme
['Computer Science | Web Information Systems']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The use of data streams has increased a lot over the last two decades or so. and
With this increase comes the need for fast and consistent fault recovery. Rollback
recovery mechanisms from traditional distributed systems have been adapted successfully for stream engines. These mechanisms can be categorized into one of three different categories; uncoordinated, coordinated and communication induced protocols. While most well-known stream engines implement a variant of the coordinated Chandy-Lamport algorithm, there is no practical comparison available that actually confirms whether this is the optimal solution for data streams specifically. Compared to traditional distributed processing solutions, stream processing has a higher need for low latencies due to the continuously generated input. This paper aims to create more insight into the advantages and disadvantages of these solutions by implementing a checkpointing algorithm for each of these categories. These are then benchmarked using various workloads and evaluated using a number of metrics such as latency, throughput, recovery times and network overhead. From these results, it can be concluded that a coordinated approach indeed outperforms uncoordinated solutions across all of these metrics, most likely due to the need for message logging in both the uncoordinated and communication induced scenarios. Additionally the benchmarks indicate that the overhead of the communication induced approach does not outweigh its benefits, due to the rarity of the occurrence of the so-called domino effect.

Files

License info not available