Dynamic block sizing for data stream processing systems
Robert Birke (Zurich Lab)
Evangelia Kalyvianaki (City University London)
Walter Binder (University of Lugano)
Martin Schmatz (Zurich Lab)
Y. Chen (Zurich Lab)
More Info
expand_more
Abstract
Real-time processing of big data is becoming one of the core operations in various areas, such as social networks and anomaly detection. Thanks to the rich information of the data, multiple queries can be executed to analyse the data and discover a variety of business values. It is very typical that a cluster infrastructure running for example a Spark Streaming data stream processing system would execute multiple queries simultaneously. To enable multiple queries being answered from the same data concurrently, it is important to effectively allocate the CPU-cores of the underlying infrastructure to the queries, meanwhile adhering to the latency constraints of the individual queries. In this paper, we consider the problem of allocating CPU-cores in a Spark Streaming infrastructure in the context of two types of queries, namely primary and optional, that are associated with high-and low-priority analysis, respectively. We develop a controller, iBLOC, that adjusts the block sizes of streaming jobs on the fly and the parallelism level of jobs, according to the input data rates and the query priorities. Our evaluation shows that we can achieve significant CPU-core savings from the primary query type such that multiple queries can run together without impairing their latency constraints, in comparison to a static block-sizing scheme.
No files available
Metadata only record. There are no files for this record.