Scotty: General and Efficient Open-source Window Aggregation for Stream Processing Systems

Journal Article (2021)
Author(s)

Jonas Traub (Technical University of Berlin)

Philipp M. Grulich (Technical University of Berlin)

Alejandro Rodríguez Cuéllar (Galapago Agroconsultores S.A.S)

Sebastian Breß (Technical University of Berlin)

Asterios Katsifodimos (TU Delft - Web Information Systems)

Tilmann Rabl (University of Potsdam)

Volker Markl (Technical University of Berlin)

Research Group
Web Information Systems
DOI related publication
https://doi.org/10.1145/3433675
More Info
expand_more
Publication Year
2021
Language
English
Research Group
Web Information Systems
Issue number
1
Volume number
46

Abstract

Window aggregation is a core operation in data stream processing. Existing aggregation techniques focus on reducing latency, eliminating redundant computations, or minimizing memory usage. However, each technique operates under different assumptions with respect to workload characteristics, such as properties of aggregation functions (e.g., invertible, associative), window types (e.g., sliding, sessions), windowing measures (e.g., time-or count-based), and stream (dis)order. In this article, we present Scotty, an efficient and general open-source operator for sliding-window aggregation in stream processing systems, such as Apache Flink, Apache Beam, Apache Samza, Apache Kafka, Apache Spark, and Apache Storm. One can easily extend Scotty with user-defined aggregation functions and window types. Scotty implements the concept of general stream slicing and derives workload characteristics from aggregation queries to improve performance without sacrificing its general applicability. We provide an in-depth view on the algorithms of the general stream slicing approach. Our experiments show that Scotty outperforms alternative solutions.

No files available

Metadata only record. There are no files for this record.