Anyone Can Cloud: Democratizing Cloud Application Programming

More Info
expand_more

Abstract

The cloud is widely adopted as a flexible and on-demand computing infrastructure. In recent years, a new and promising cloud paradigm emerged: serverless computing. Serverless computing promises a pay-as-you-go model and offers features such as autoscaling and high availability. Nevertheless, developing scalable cloud applications remains a painstaking task. Currently, programming models for the cloud mix operational code and business logic causing developers to spend a significant amount of time on other tasks rather than implementing the intended functionality. Moreover, the developer must consider distributed systems concerns such as consistency, communication, and persistence. Modern dataflow systems, such as Apache Flink and Google Dataflow, address these concerns but suffer from the same problem: they lack an intuitive programming interface for general-purpose applications. It remains an open problem to design a developer-friendly programming interface for implementing scalable cloud applications with strong guarantees.

In this thesis, we solve this problem by presenting an intuitive programming interface for scalable cloud applications in which developers primarily focus on business logic. Given a set of easy-to-follow code conventions, programmers author stateful entities a programming abstraction embedded in Python. We present a compiler pipeline named StateFlow, to analyze the abstract syntax tree of a Python application and rewrite it into an intermediate representation based on stateful dataflow graphs. In addition, we present a set of building blocks that allow the execution of this intermediate representation on a target runtime system or cloud provider without a tight integration. Supported runtime systems include Apache Flink and Beam, AWS Lambda, Flink's Statefun, and Cloudburst, each providing a different set of guarantees. Finally, we introduce a client-side programming interface and HTTP server integration to interact with the deployed application.

We demonstrate that the execution with StateFlow typically incurs less than 1\% overhead. Furthermore, we identify limitations of current dataflow systems in executing cloud applications at scale in a performance benchmark. Finally, we compare the expressiveness of StateFlow's programming abstraction to native runtime implementations. We show that StateFlow lets a developer write universal code that does not mix business with operational logic or the runtime's API and prevents vendor lock-in by allowing them to switch between runtimes in less than ten lines of code.