S-Query: Opening the Black Box of Internal Stream Processor State

More Info
expand_more

Abstract

At the moment we are witnessing the maturation of distributed streaming dataflow systems whose use-cases have departed from the mere analysis of streaming windows and complex-event processing, as they now extend to cloud applications, workflows and even e-commerce. The state of streaming operators has been so far hidden from external applications. In this thesis it is argued that exposing this internal state to outside applications by making it queryable, opens the road for novel use-cases. To this end, we introduce S-Query: a system and reference architecture where the state of stream operators can be queried - either live or through snapshots, achieving different isolation levels. It is shown how this can be implemented in an existing open-source stream processor and how queryable state can affect the performance of such a system. Our experimental evaluation suggests that snapshot configuration adds only up to 5ms in latency in the 99.99th percentile and no increase in 0-90th percentiles.