Circular Image

K. Psarakis

info

Please Note

19 records found

Asymmetric Dependencies for Fast Geo-Distributed Operations

Conference paper (2026) - Jonathan Arns, Harald Ng, K. Psarakis, A Katsifodimos, Paris Carbone
Low-latency geo-distributed applications currently face the barrier of cross-site coordination for ensuring state consistency. Existing mixed-consistency models leverage the existence of strongly- and weakly-consistent operations in a given application, to avoid coordination whenever possible. However, existing approaches are rather pessimistic, coordinating more than is necessary. In this paper, we introduce Semi-Linearizability (SL): a consistency model that executes application operations with linearizability guarantees only when strictly necessary, avoiding over-coordination. Specifically, we propose novel operation semantics that can encode ordering relationships between application operations and map them to coordination primitives. Our proposed semantics can be used to reason over latent, asymmetric dependencies between different operations and optimize their execution. We show how SL enables a new class of safe, uncoordinated operations that previous models would otherwise execute under globally strict order, while offering substantial performance gains without violating application invariants. To demonstrate the advantages of SL, we implemented DeMon, a system that achieves four orders of magnitude lower latency on the most frequent operation in the widely used RUBiS benchmark compared to state-of-the-art systems. ...

Towards Serverless Transactional Functions

Journal article (2026) - Kyriakos Psarakis, George Christodoulou, George Siachamis, Marios Fragkoulis, Asterios Katsifodimos
Developing stateful cloud applications, such as low-latency workflows and microservices with strict consistency requirements, remains arduous for programmers. The Stateful Functions-as-a-Service (SFaaS) paradigm aims to serve these use cases. However, existing approaches provide weak transactional guarantees or perform expensive external state accesses requiring inefficient transactional protocols that increase execution latency. In this paper, we present Styx, a novel dataflow-based SFaaS runtime that executes serializable transactions consisting of stateful functions that form arbitrary call-graphs with exactly-once guarantees. Styx extends a deterministic transactional protocol by contributing: i) a function acknowledgment scheme to determine transaction boundaries required in SFaaS workloads, ii) a function-execution caching mechanism, and iii) an early-commit reply mechanism that substantially reduces transaction execution latency. In addition, Styx’s elasticity supports state migration for load balancing using scale-up and scale-down operations when workloads introduce uneven overhead among workers. Experiments with the YCSB, TPC-C, and Deathstar benchmarks show that Styx outperforms state-of-the-art approaches by achieving at least one order of magnitude higher throughput while exhibiting near-linear scalability and low latency. Moreover, state migration experiments with YCSB and TPC-C show that Styx’s approach to state migration outperforms the baseline, a stop and restart migration approach tailored to Styx, by adapting swiftly to workload changes while maintaining low latency. ...

Transactional Stateful Functions on Streaming Dataflows

Doctoral thesis (2026) - K. Psarakis, G.J.P.M. Houben, A. Katsifodimos
Web applications power almost every aspect of our digitalized society, from entertainment to web shopping, vacation planning and booking, online games, communication, work, and social interaction. However, building scalable and consistent web applications in modern cloud environments requires extensive and diverse expertise in multiple domains, such as cloud computing, software development, distributed and database systems, and domain knowledge. These requirements make the development of such applications possible only by a few highly talented individuals that only large corporations can hire. In this thesis, we aim at democratizing the development and maintenance of such cloud applications by identifying and addressing three key challenges: (i) programmability of cloud applications; (ii) high-performance serializable transactions with fault tolerance guarantees; and (iii) serverless semantics. To address those, we created Stateflow, a high-level, object-oriented, easy-to-use programming model that operates alongside Styx, a novel deterministic dataflow engine that provides high-performance serializable transactions and serverless semantics.

While investigating the challenge of democratizing scalable cloud applications, we discovered that they closely resemble the principles behind the streaming dataflow execution model. In Chapter 1, we highlight the similarities of streaming dataflow processing and the current state-of-the-art event-driven microservice architectures and lay a path towards the ideal cloud application runtime. To validate our hypothesis, we created T-Statefun, presented in Chapter 2, by adapting an existing dataflow system to support transactional cloud applications. At the time, the best candidate appeared to be Apache Flink Statefun, a stateful function as a service system (SFaaS), to which we added transactional support with coordinator functions. With T-Statefun, we showed that a dataflow system can support transactional cloud applications through an SFaaS API. Furthermore, its development helped us identify two significant issues: (i) it was challenging to program, especially after the addition of the coordinator functions; and (ii) due to the disaggregation of state and processing and an inefficient transactional protocol, T-Statefun was lacking in performance.

In this thesis, to address the programmability issue, in Chapter 3 we introduce Stateflow, a user-friendly programming model where software developers code in the well-established object-oriented programming style with zero boilerplate code, and Stateflow transforms it into an intermediate representation based on stateful dataflow graphs. While experimenting with Stateflow, we verified the inefficiencies detected in Chapter 2 regarding messaging and state, or the lack of transactional support in the rest of Stateflow’s supported backends. Thus, in Chapter 4, we present all the details behind the design of Styx, a distributed streaming dataflow system that supports multi-partition deterministic transactions with serializable isolation guarantees through a high-level, standard Python programming model that obviates transaction failure management. Our design choices and novel algorithms allow Styx to outperform the state-of-the-art systems by at least one order of magnitude in all tested workloads regarding throughput.

Styx demonstrates that it is possible to build a high-performance SFaaS system that provides transactional and fault-tolerance guarantees while offering an intuitive programming model with minimal boilerplate. Building on this foundation, we extend Styx with the ability to dynamically and efficiently adapt to varying workloads. To enable this, Chapter 5 explores how Styx can migrate state transactionally, a necessary capability for elasticity, given that Styx maintains application state in memory.

We conclude this thesis by summarizing the key findings and reflecting on the contributions, critically examining the limitations of the proposed methods, and considering their broader ethical and societal implications. Moreover, based on the insights we gained from creating the Stateflow programming model and the Styx runtime, we lay out the new challenges and future directions in the field. ...
Traditional monolithic applications are migrated to the cloud, typically using a microservice-like architecture. Although this migration leads to significant benefits such as scalability and development agility, it also leaves behind the transactional guarantees that database systems have provided to monolithic applications for decades. In the cloud era, developers build transactional and fault-tolerant distributed applications by explicitly programming transaction protocols at the application level.
In this paper, we argue that the principles behind the streaming dataflow execution model and deterministic transactional protocols provide a powerful and suitable substrate for executing transactional cloud applications. To this end, we introduce Styx, a transactional application runtime based on streaming dataflows that enables an object-oriented programming model for scalable, faulttolerant cloud applications with serializable guarantees. ...
Developing and deploying transactional cloud applications such as banking and e-commerce systems is a daunting task for developers. The reason for this difficulty is twofold. First, developing such applications shifts the developers’ focus from the application logic to considerations of distributed transactions, fault-tolerance, consistency, and scalability. Second, deploying such applications involves multiple systems, such as databases, load balancers, or containerized services, impeding efficient resource management. This demonstration presents Styx, a scalable application runtime that allows developers to build scalable and transactional cloud applications with minimal effort. It supports serializability and exactly-once guarantees and focuses on the ease of development and deployment, as well as Styx’s fault-tolerance mechanisms. ...
Developing stateful cloud applications, such as low-latency workflows and microservices with strict consistency requirements, remains arduous for programmers. The Stateful Functions-as-a-Service (SFaaS) paradigm aims to serve these use cases. However, existing approaches provide weak transactional guarantees or perform expensive external state accesses requiring inefficient transactional protocols that increase execution latency. In this paper, we present Styx, a novel dataflow-based SFaaS runtime that executes serializable transactions consisting of stateful functions that form arbitrary call-graphs with exactly-once guarantees. Styx extends a deterministic transactional protocol by contributing: i) a function acknowledgment scheme to determine transaction boundaries required in SFaaS workloads, ii) a function-execution caching mechanism, and iii) an early commit-reply mechanism that substantially reduces transaction execution latency. Experiments with the YCSB, TPC-C, and Deathstar benchmarks show that Styx outperforms state-of-the-art approaches by achieving at least one order of magnitude higher throughput while exhibiting near-linear scalability and low latency. ...
Other (2025) - R.N. Laigner, G.C. Christodoulou, K. Psarakis, A Katsifodimos, Yongluan Zhou
Transactional cloud applications such as payment, booking, reservation systems, and complex business workflows are currently being rewritten for deployment in the cloud. This migration to the cloud is happening mainly for reasons of cost and scalability. Over the years, application developers have used different migration approaches, such as microservice frameworks, actors, and stateful dataflow systems. The migration to the cloud has brought back data management challenges traditionally handled by database management systems. Those challenges include ensuring state consistency, maintaining durability, and managing the application lifecycle. At the same time, the shift to a distributed computing infrastructure introduced new issues, such as message delivery, task scheduling, containerization, and (auto)scaling. Although the data management community has made progress in developing analytical and transactional database systems, transactional cloud applications have received little attention in database research. This tutorial aims to highlight recent trends in the area and discusses open research challenges for the data management community. ...
While the concept of large-scale stream processing is very popular nowadays, efficient dynamic allocation of resources is still an open issue in the area. The database research community has yet to evaluate different autoscaling techniques for stream processing engines under a robust benchmarking setting and evaluation framework. As a result, no conclusions can be made about the current solutions and problems that remain unsolved. Therefore, we address this issue with a principled evaluation approach.

This paper evaluates the state-of-the-art control-based solutions in the autoscaling area with diverse, dynamic workloads, applying specific metrics. We investigate different aspects of the autoscaling problem as performance and convergence. Our experiments reveal that current control-based autoscaling techniques fail to account for generated lag cost by rescaling or underprovisioning and cannot efficiently handle practical scenarios of intensely dynamic workloads. Unexpectedly, we discovered that an autoscaling method not tailored for streaming can outperform others in certain scenarios. ...
Conference paper (2024) - K. Psarakis, W.D. Zorgdrager, M. Fragkoulis, Guido Salvaneschi, A Katsifodimos
Although the cloud has reached a state of robustness, the burden of using its resources falls on the shoulders of programmers who struggle to keep up with ever-growing cloud infrastructure services and abstractions. As a result, state management, scaling, operation, and failure management of scalable cloud applications require disproportionately more effort than developing the applications' actual business logic. Our vision aims to raise the abstraction level for programming scalable cloud applications by compiling stateful entities – a programming model enabling imperative transactional programs authored in Python – into stateful streaming dataflows. We propose a compiler pipeline that analyzes the abstract syntax tree of stateful entities and transforms them into an intermediate representation based on stateful dataflow graphs. It then compiles that intermediate representation into different dataflow engines, leveraging their exactly-once message processing guarantees to prevent state or failure management primitives from "leaking" into the level of the programming model. Preliminary experiments with a proof of concept implementation show that despite program transformation and translation to dataflows, stateful entities can perform at sub-100ms latency even for transactional workloads. ...
Conference paper (2024) - G. Siachamis, K. Psarakis, M. Fragkoulis, A. van Deursen, Paris Carbone, A Katsifodimos
Stream processing in the last decade has seen broad adoption in both commercial and research settings. One key element for this success is the ability of modern stream processors to handle failures while ensuring exactly-once processing guarantees. At the moment of writing, virtually all stream processors that guarantee exactly-once processing implement a variant of Apache Flink's coordinated checkpoints - an extension of the original Chandy-Lamport checkpoints from 1985. However, the reasons behind this prevalence of the coordinated approach remain anecdotal, as reported by practitioners of the stream processing community. At the same time, common checkpointing approaches, such as the uncoordinated and the communication-induced ones, remain largely unexplored. This paper is the first to address this gap by i) shedding light on why practitioners have favored the coordinated approach and ii) investigating whether there are viable alternatives. To this end, we implement three checkpointing approaches that we surveyed and adapted for the distinct needs of streaming dataflows. Our analysis shows that the coordinated approach outperforms the uncoordinated and communication-induced protocols under uniformly distributed workloads. To our surprise, however, the uncoordinated approach is not only competitive to the coordinated one in uniformly distributed workloads, but it also outperforms the coordinated approach in skewed workloads. We conclude that rather than blindly employing coordinated checkpointing, research should focus on optimizing the very promising uncoordinated approach, as it can address issues with skew and support prevalent cyclic queries. We believe that our findings can trigger further research into checkpointing mechanisms. ...
Conference paper (2023) - Andra Ionescu, Kostas Patroumpas, Kyriakos Psarakis, Georgios Chatzigeorgakidis, Diego Collarana, Kai Barenscher, Dimitrios Skoutas, Asterios Katsifodimos, Spiros Athanasiou
The increasing need for data trading across businesses nowadays has created a demand for data marketplaces. However, despite the intentions of both data providers and consumers, today’s data marketplaces remain mere data catalogs. We believe that marketplaces of the future require a set of value-added services, such as advanced search and discovery, that have been proposed in the database research community for years, but are not yet put to practice. With this paper, we report on the effort to engineer and develop an open-source modular data market platform to enable both entrepreneurs and researchers to setup and experiment with data marketplaces. To this end, we implemented and extended existing methods for data profiling, dataset search & discovery, and data recommendation. These methods are available as open-source libraries. In this paper we report on how those tools were assembled together to build topio.market, a real-world web platform for trading geospatial data, that is currently in a beta phase. ...
Abstract (2023) - K. Psarakis, W.D. Zorgdrager, M. Fragkoulis, Guido Salvaneschi, A Katsifodimos
While there are multiple approaches for distributed application programming (e.g., Bloom [2], Hilda [14], Cloudburst [12], AWS Lambda, Azure Durable Functions, and Orleans [3, 4]), in practice developers mainly use libraries of popular general purpose languages such as Spring Boot in Java, and Flask in Python. None of these approaches offers message processing guarantees, failing to support exactly-once processing: the ability of a system to reflect the changes of a message to the state exactly one time. Instead, all of the above approaches offer at-most- or at-least-once processing semantics. Programmers then have to “pollute” their business logic with consistency checks, state rollbacks, timeouts, retries, and idempotency [8, 9]. We argue that no matter how we approach cloud programming, unless an execution engine offers exactly-once processing guarantees, we will never remove the burden of distributed systems aspects from programmers. In short, exactly-once processing should be assumed at the level of the programming model. To the best of our knowledge, the only systems able to guarantee exactly-once message processing [5, 11] at the time of writing, are batch [1, 7, 15] and streaming [6, 10, 13] dataflow systems. However, their programming model follows the paradigm of functional dataflow APIs which are cumbersome to use, and require training, and heavy rewrites of the typical imperative code that developers prefer to use for expressing application logic. For these reasons, we believe that the dataflow model should be used as low-level IR for the modeling and execution of distributed applications, but not as a programmer-facing model. Technically, one of the main challenges in adopting a dataflow-based intermediate representation, is that the dataflow model is essentially functional, with immutable values being propagated across operators that typically do not share a global state. Hence, adopting a dataflow-based IR entails translating (arbitrary) imperative code into the functional style. Compiler research has systematically explored only the opposite direction: to compile code in functional programming languages into a representation that is executable on imperative architectures – like virtually all modern microprocessors. Yet, the translation from imperative to functional or dataflow programming remains largely unexplored. To this end, we report on Stateful Entities a prototypical programming model (exemplified in Figure 1), compiler pipeline, and IR that compiles imperative, transactional object-oriented applications into distributed dataflow graphs and executes them on existing dataflow systems. The proposed system presented in this paper can be found at: https://github.com/delftdata/stateflow. Our preliminary experiments showed that the translation of imperative programs into dataflow graphs yields very promising performance results, of less than 50ms latency. ...
Conference paper (2023) - G. Siachamis, K. Psarakis, M. Fragkoulis, Odysseas Papapetrou, A. van Deursen, A Katsifodimos
How can we perform similarity joins of multi-dimensional streams in a distributed fashion, achieving low latency? Can we adaptively repartition those streams in order to retain high performance under concept drifts? Current approaches to similarity joins are either restricted to single-node deployments or focus on set-similarity joins, failing to cover the ubiquitous case of metric-space similarity joins. In this paper, we propose the first adaptive distributed streaming similarity join approach that gracefully scales with variable velocity and distribution of multi-dimensional data streams. Our approach can adaptively rebalance the load of nodes in the case of concept drifts, allowing for similarity computations in the general metric space. We implement our approach on top of Apache Flink and evaluate its data partitioning and load balancing schemes on a set of synthetic datasets in terms of latency, comparisons ratio, and data duplication ratio ...
In this work, we evaluate autoscaling solutions for stream processing engines. Although autoscaling has become a mainstream subject of research in the last decade, the database research community has yet to evaluate different autoscaling techniques under a proper benchmarking setting and evaluation framework. As a result, every newly proposed autoscaling solution only performs a shallow performance evaluation and comparison against existing solutions. In this paper, we evaluate autoscaling solutions by employing two streaming queries and a dynamic workload that follows a cosinus pattern. Our experiments reveal that current autoscaling techniques fail to account for generated lag due to rescaling or underprovisioning and cannot efficiently handle practical scenarios of intensely dynamic workloads. ...
Conference paper (2023) - Andra Ionescu, Alexandra Alexandridou, Kyriakos Psarakis, Kostas Patroumpas, Georgios Chatzigeorgakidis, Dimitrios Skoutas, Spiros Athanasiou, Rihan Hai, Asterios Katsifodimos
The increasing need for data trading has created a high demand for data marketplaces. These marketplaces require a set of valueadded services, such as advanced search and discovery, that have been proposed in the database research community for years, but are yet to be put to practice. In this paper we propose to demonstrate the Topio Marketplace, an open-source data market platform that facilitates the search, exploration, discovery and augmentation of data assets. To support filtering, searching and discovery of data assets, we developed methods to extract and visualise a variety of metadata, as well as methods to discover related assets and mechanism to augment them. This paper aims at presenting these methods with a real deployment of the Topio marketplace, comprising hundreds of open and proprietary datasets. ...
Journal article (2022) - Martijn de Heus, Kyriakos Psarakis, Marios Fragkoulis, Asterios Katsifodimos
Serverless computing is currently the fastest-growing cloud services segment. The most prominent serverless offering is Function-as-a-Service (FaaS), where users write functions and the cloud automates deployment, maintenance, and scalability. Although FaaS is a good fit for executing stateless functions, it does not adequately support stateful constructs like microservices and scalable, low-latency cloud applications. Recently, there have been multiple attempts to add first-class support for state in FaaS systems, such as Microsoft Orleans, Azure Durable Functions, or Beldi. These approaches execute business code inside stateless functions, handing over their state to an external database. In contrast, approaches such as Apache Flink's StateFun follow a different design: a dataflow system such as Apache Flink handles all state management, messaging, and checkpointing by executing a stateful dataflow graph providing exactly-once state processing guarantees. This design relieves programmers from having to “pollute” their business logic with distributed systems error checking, management, and mitigation. Although programmers can easily develop applications without worrying about messaging and state management, executing transactions across stateful functions remains an open problem. In this paper, we introduce a programming model and implementation for transaction orchestration of stateful serverless functions. Our programming model supports serializable distributed transactions with two-phase commit, as well as eventually consistent workflows with Sagas. We design and implement our programming model on Apache Flink StateFun to leverage Flink's exactly-once processing and state management guarantees. Our experiments show that the approach of building transactional systems on top of dataflow graphs can achieve very high throughput, but with latency overhead due to checkpointing mechanism guaranteeing the exactly-once processing. We compare our approach to Beldi that implements two-phase commit on AWS lambda functions backed by DynamoDB for state management, as well as an implementation of a system that makes use of CockroachDB as its backend. ...

Matching Tabular Data at Scale

Capturing relationships among heterogeneous datasets in large data lakes - traditionally termed schema matching - is one of the most challenging problems that corporations and institutions face nowadays. Discovering and integrating datasets heavily relies on the effectiveness of the schema matching methods in use. However, despite the wealth of research, evaluation of schema matching methods is still a daunting task: there is a lack of openly-available datasets with ground truth, reference method implementations, and comprehensible GUIs that would facilitate development of both novel state-of-the-art schema matching techniques and novel data discovery methods.Our recently proposed Valentine is the first system to offer an open-source experiment suite to organize, execute and orchestrate large-scale matching experiments. In this demonstration we present its functionalities and enhancements: i) a scalable system, with a user-centric GUI, that enables the fabrication of datasets and the evaluation of matching methods on schema matching scenarios tailored to the scope of tabular dataset discovery, ii) a scalable holistic matching system that can receive tabular datasets from heterogeneous sources and provide with similarity scores among their columns, in order to facilitate modern procedures in data lakes, such as dataset discovery. ...
Data scientists today search large data lakes to discover and integrate datasets. In order to bring together disparate data sources, dataset discovery methods rely on some form of schema matching: the process of establishing correspondences between datasets. Traditionally, schema matching has been used to find matching pairs of columns between a source and a target schema. However, the use of schema matching in dataset discovery methods differs from its original use. Nowadays schema matching serves as a building block for indicating and ranking inter-dataset relationships. Surprisingly, although a discovery method’s success relies highly on the quality of the underlying matching algorithms, the latest discovery methods employ existing schema matching algorithms in an ad-hoc fashion due to the lack of openly-available datasets with ground truth, reference method implementations, and evaluation metrics. In this paper, we aim to rectify the problem of evaluating the effectiveness and efficiency of schema matching methods for the specific needs of dataset discovery. To this end, we propose Valentine, an extensible open-source experiment suite to execute and organize large-scale automated matching experiments on tabular data. Valentine includes implementations of seminal schema matching methods that we either implemented from scratch (due to absence of open source code) or imported from open repositories. The contributions of Valentine are: i) the definition of four schema matching scenarios as encountered in dataset discovery methods, ii) a principled dataset fabrication process tailored to the scope of dataset discovery methods and iii) the most comprehensive evaluation of schema matching techniques to date, offering insight on the strengths and weaknesses of existing techniques, that can serve as a guide for employing schema matching in future dataset discovery methods. ...
Serverless computing is currently the fastest-growing cloud services segment. The most prominent serverless offering is Function-as-a-Service (FaaS), where users write functions and the cloud automates deployment, maintenance, and scalability. Although FaaS is a good fit for executing stateless functions, it does not adequately support stateful constructs like microservices and scalable, low-latency cloud applications, mainly because it lacks proper state management support and the ability to perform function-to-function calls. Most importantly, executing transactions across stateful functions remains an open problem. In this paper, we introduce a programming model and implementation for transaction orchestration of stateful serverless functions. Our programming model supports serializable distributed transactions with two-phase commit, as well as relaxed transactional guarantees with Sagas. We design and implement our programming model on Apache Flink StateFun. We choose to build our solution on top of StateFun in order to leverage Flink's exactly-once processing and state management guarantees. We base our evaluation on the YCSB benchmark, which we extended with transactional operations and adapted for the SFaaS programming model. Our experiments show that our transactional orchestration adds 10% overhead to the original system and that Sagas can achieve up to 34% more transactions per second than two-phase commit transactions at a sub-200ms latency. ...