AccStream

Accuracy-Aware Overload Management for Stream Processing Systems

Conference Paper (2017)
Author(s)

Haiyang Sun (University of Lugano)

Robert Birke (Zurich Lab)

Walter Binder (University of Lugano)

Mathias Bjorkqvist (Zurich Lab)

Lydia Y. Chen (Zurich Lab)

DOI related publication
https://doi.org/10.1109/ICAC.2017.37 Final published version
More Info
expand_more
Publication Year
2017
Language
English
Article number
8005326
Pages (from-to)
39-48
ISBN (electronic)
9781538617618
Event
Downloads counter
127

Abstract

With the rapid growth of social media and Internet-of-Things, real-time processing of big data has become a core operation in various business areas. It is of paramount importance that big-data analyses are executed timely with specified accuracy guarantees. However, workloads in the wild are highly bursty with skewed contents and often present the conundrum of meeting latency and accuracy requirements simultaneously. In this paper we propose AccStream, which selectively samples and processes data tuples and blocks on emerging batch streaming platforms with a special focus on analysis of aggregation, e.g., counts, and top-k. AccStream dynamically learns the latency model of analysis jobs via on-line probing technique and employs sample theory to determine the lower limit of data so as to fulfill given accuracy targets. A unique feature of AccStream ensuring strong latency-accuracy fulfillment even under conflicts is the hybrid windowing that trades off data freshness via a combination of tumbling and rolling windows. We evaluate the prototype of AccStream on Spark Streaming, analyzing Twitter data. Our extensive results confirm that AccStream is able to achieve the latency and accuracy target against a wide range of conditions, i.e., slow and fast dynamic load intensities and content skewnesses, even when facing conflicting latency and accuracy targets. All in all, the effectiveness of AccStream in delivering timely, accurate, and (partial) fresh streaming analytics lies in shedding the adequate amount of input data at the right time and place.