Building scalable and consistent cloud applications is notoriously difficult due to the challenges of state management and execution consistency in distributed environments. Functions-as-a-Service (FaaS) platforms offer flexible scalability, but weak execution guarantees forces e
...
Building scalable and consistent cloud applications is notoriously difficult due to the challenges of state management and execution consistency in distributed environments. Functions-as-a-Service (FaaS) platforms offer flexible scalability, but weak execution guarantees forces engineers to mix business logic with infrastructure concerns, adding error-handling code, retry mechanisms and consistency checks throughout their applications. At the same time, dataflow systems like Apache Flink offer exactly-once semantics, but their functional APIs often conflict with the imperative, object-oriented style preferred by mainstream developers.
This work aims to address this disconnect, arguing that modern transactional applications, from e-commerce to payment systems to business workflows, naturally form stateful dataflow graphs. By allowing developers to write familiar imperative code that executes on dataflow systems with strong consistency guarantees, we could eliminate the need to handle many infrastructure concerns explicitly.
To this end, we introduce Cascade, a compiler pipeline and intermediate representation that bridges the gap by translating imperative Python code into stateful, parallelizable dataflow graphs. Cascade extends prior work by providing a representation that is both expressive and optimizable, and we demonstrate optimizations including parallel execution via data dependency analysis and dynamic value prefetching. Our results show significant performance gains with these optimizations, all while maintaining the strong execution guarantees of the underlying execution target. Finally, we offer avenues for future research by discussing further optimization possibilities and extensions within our proposed framework.