State Migration in Stream Processing Systems

More Info
expand_more

Abstract

In recent years, the interest for serverless computing has grown tremendously. The most common form of serverless computing, Function-as-a-Service (FaaS), uses data centers of large public cloud providers to run simple functions. The cloud providers are responsible for the operational and deployment aspects. Non-trivial function implementations require state to perform the desired business logic. Current FaaS implementations using an externalized database for state cannot achieve the low latency scenarios required for some services. Previous work investigated Stateful Function-as-a-Service (SFaaS) using Stream Processing Systems as a runtime. State migration, as a result of schema evolution on SFaaS, remains an open challenge.

This thesis investigates common practices regarding schema evolution and their applicability to stream processing systems. Based on the investigation, the performed work demonstrates a schema driven approach to state migration in stream processing systems. The approach demonstrates that a view on both the source and target state schema can also yield implicit transformations for schema compatibility.

The work is demonstrated using a modified version of Apache Flink and evaluated based on common evolution scenarios and hypothesized changes to real world queries from the NEXmark benchmark.