Dynamic block sizing for data stream processing systems

None, None; None, None; None, None; None, None; None, None

Dynamic block sizing for data stream processing systems

Conference Paper (2016)

Author(s)

Robert Birke (Zurich Lab)

Evangelia Kalyvianaki (City University London)

Walter Binder (University of Lugano)

Martin Schmatz (Zurich Lab)

Lydia Y. Chen (Zurich Lab)

Affiliation

External organisation

DOI related publication

https://doi.org/10.1109/IC2EW.2016.9 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:fc7a3931-5586-4d04-9a77-ab50b1340bcf

More Info

expand_more

Publication Year

2016

Language

English

Affiliation

External organisation

Article number

7527851

Pages (from-to)

216-222

ISBN (electronic)

9781509019618

Event

2016 IEEE International Conference on Cloud Engineering Workshops, IC2EW 2016 (2016-04-04 - 2016-04-08), Berlin, Germany

Downloads counter

131

Abstract

Real-time processing of big data is becoming one of the core operations in various areas, such as social networks and anomaly detection. Thanks to the rich information of the data, multiple queries can be executed to analyse the data and discover a variety of business values. It is very typical that a cluster infrastructure running for example a Spark Streaming data stream processing system would execute multiple queries simultaneously. To enable multiple queries being answered from the same data concurrently, it is important to effectively allocate the CPU-cores of the underlying infrastructure to the queries, meanwhile adhering to the latency constraints of the individual queries. In this paper, we consider the problem of allocating CPU-cores in a Spark Streaming infrastructure in the context of two types of queries, namely primary and optional, that are associated with high-and low-priority analysis, respectively. We develop a controller, iBLOC, that adjusts the block sizes of streaming jobs on the fly and the parallelism level of jobs, according to the input data rates and the query priorities. Our evaluation shows that we can achieve significant CPU-core savings from the primary query type such that multiple queries can run together without impairing their latency constraints, in comparison to a static block-sizing scheme.