GC

G.C. Christodoulou

info

Please Note

3 records found

Stateful Functions-as-a-Service (SFaaS) platforms, such as Styx, are emerging as powerful abstractions for building distributed, serverless cloud applications. By combining the abilities of FaaS with strong transactional guarantees, they enable complex, stateful workflows without requiring developers to manage infrastructure. However, they lack built-in support for analytical queries across distributed function state. This thesis addresses that gap by proposing H-Styx, whose hybrid architecture extends Styx with a snapshot-based Query Engine, enabling near-real-time OLAP queries over global state while maintaining performance isolation for transactions. The Query Engine integrates seamlessly into the Styx architecture, leveraging periodic snapshots transmitted via a loosely-coupled, asynchronous interface. It ingests partitioned state from object store MinIO into columnar database DuckDB, supports incremental delta loads, and delivers results over a Kafka-based interface to achieve scalable, low-latency analytical querying while employing robust fault tolerance.

Empirical evaluation demonstrates that H-Styx preserves transactional throughput and latency under hybrid workloads, while significantly outperforming a baseline HTAP architecture (Postgres with Streaming Replication) on analytical throughput and providing superior workload isolation. These results validate the feasibility of supporting hybrid transactional and analytical processing in SFaaS environments. Overall, H-Styx bridges a crucial capability gap in SFaaS, enabling more powerful data-driven applications in distributed, event-driven architectures. ...
Building scalable and consistent cloud applications is notoriously difficult due to the challenges of state management and execution consistency in distributed environments. Functions-as-a-Service (FaaS) platforms offer flexible scalability, but weak execution guarantees forces engineers to mix business logic with infrastructure concerns, adding error-handling code, retry mechanisms and consistency checks throughout their applications. At the same time, dataflow systems like Apache Flink offer exactly-once semantics, but their functional APIs often conflict with the imperative, object-oriented style preferred by mainstream developers.

This work aims to address this disconnect, arguing that modern transactional applications, from e-commerce to payment systems to business workflows, naturally form stateful dataflow graphs. By allowing developers to write familiar imperative code that executes on dataflow systems with strong consistency guarantees, we could eliminate the need to handle many infrastructure concerns explicitly.

To this end, we introduce Cascade, a compiler pipeline and intermediate representation that bridges the gap by translating imperative Python code into stateful, parallelizable dataflow graphs. Cascade extends prior work by providing a representation that is both expressive and optimizable, and we demonstrate optimizations including parallel execution via data dependency analysis and dynamic value prefetching. Our results show significant performance gains with these optimizations, all while maintaining the strong execution guarantees of the underlying execution target. Finally, we offer avenues for future research by discussing further optimization possibilities and extensions within our proposed framework. ...

Performance Evaluation and Insights

Distributed systems are vital for handling large-scale data and rely on geo-distributed databases to ensure low latency and high availability. Traditional benchmarks, such as TPC-C and YCSB-T, are not designed to handle the complexities of geo-distributed environments and do not allow for configuration of multi-home transaction ratios or dynamic data access patterns. To fill this gap, we implement a benchmark based on the MovR workload and assess its performance on the Detock, Janus, SLOG, and Calvin geo-distributed database systems. Key insights revealed through experiments are that network conditions act as a major bottleneck and high concurrency leads to unsustainable latency spikes which severely limits scalability. ...