D.G. Sprokholt | TU Delft Repository

Cage

Hardware-Accelerated Safe WebAssembly

Conference paper (2025) - Martin Fink, Dimitrios Stavrakakis, Dennis Sprokholt, Soham Chakraborty, Jan Erik Ekberg, Pramod Bhatotia

WebAssembly (WASM) is an immensely versatile and increasingly popular compilation target. It executes applications written in several languages (e.g., C/C++) with near-native performance in various domains (e.g., mobile, edge, cloud). Despite WASM’s sandboxing feature, which isolates applications from other instances and the host platform, WASM does not inherently provide any memory safety guarantees for applications written in low-level, unsafe languages. To this end, we propose Cage, a hardware-accelerated toolchain for WASM that supports unmodified applications compiled to WASM and utilizes diverse Arm hardware features aiming to enrich the memory safety properties of WASM. Precisely, Cage leverages Arm’s Memory Tagging Extension (MTE) to (i) provide spatial and temporal memory safety for heap and stack allocations and (ii) improve the performance of WASM’s sandboxing mechanism. Cage further employs Arm’s Pointer Authentication (PAC) to prevent leaked pointers from being reused by other WASM instances, thus enhancing WASM’s security properties. We implement our system based on 64-bit WASM. We provide a WASM compiler and runtime with support for Arm’s MTE and PAC. On top of that, Cage’s LLVM-based compiler toolchain transforms unmodified applications to provide spatial and temporal memory safety for stack and heap allocations and prevent function pointer reuse. Our evaluation on real hardware shows that Cage incurs minimal runtime (< 5.8 %) and memory (< 5.3 %) overheads and can improve the performance of WASM’s sandboxing mechanism, achieving a speedup of over 5.1 %, while offering efficient memory safety guarantees. ...

Correct Translation between Weak Memory Model Architectures

Doctoral thesis (2025) - D.G. Sprokholt, K.G. Langendoen, S.S. Chakraborty

This dissertation is about translating concurrent programs between computer architectures. Legacy programs—built-for and tested-on x86—behave differently on newer architectures, such as Arm and RISC-V. Particularly, weak memory behaviors emerge when two micro-architectural features interact: (i) concurrency, where multiple CPU cores simultaneously execute parts of a program, and (ii) out-of-order execution, where a CPU core reorders instructions to increase throughput. Programs can non-deterministically show one of various weak memory behaviors, meaning it could behave differently when executing again. Those behaviors differ between architectures. When migrating programs from x86 to Arm or RISC-V, the same program could non-deterministically show behaviors never observed on x86.

In part I, we look at binary translators, which are software systems that translate compiled binary programs between architectures. We study the translation process of three such real-world systems, identify errors in their translation of concurrency primitives, and fix them. We propose mathematically-rigorous weak memory models for these translators. We then define mapping schemes to translate concurrency primitives one-by-one from x86 to Arm and RISC-V. With the formal semantics, we prove those mapping schemes correct in the Agda proof assistant.

In part II, we study the common structure of our weak memory proofs written in Agda. As those proofs are often large, complex, and rigid, we identify their common structures for which we identify domain-specific abstractions. We implement those abstractions in our novel Agda proof framework Burrow to greatly simplify writing future weak memory proofs.

In part III, we use dynamic analysis to identify weak behaviors that were never seen on x86 but could appear on Arm. Our analysis simulates the program’s execution with the formal weak memory semantics of x86 and Arm. This analysis identifies only the new behaviors the program shows in practice. After finding any new behavior on Arm, we judiciously modify the program to eliminate only that behavior. ...

This dissertation is about translating concurrent programs between computer architectures. Legacy programs—built-for and tested-on x86—behave differently on newer architectures, such as Arm and RISC-V. Particularly, weak memory behaviors emerge when two micro-architectural features interact: (i) concurrency, where multiple CPU cores simultaneously execute parts of a program, and (ii) out-of-order execution, where a CPU core reorders instructions to increase throughput. Programs can non-deterministically show one of various weak memory behaviors, meaning it could behave differently when executing again. Those behaviors differ between architectures. When migrating programs from x86 to Arm or RISC-V, the same program could non-deterministically show behaviors never observed on x86.

In part I, we look at binary translators, which are software systems that translate compiled binary programs between architectures. We study the translation process of three such real-world systems, identify errors in their translation of concurrency primitives, and fix them. We propose mathematically-rigorous weak memory models for these translators. We then define mapping schemes to translate concurrency primitives one-by-one from x86 to Arm and RISC-V. With the formal semantics, we prove those mapping schemes correct in the Agda proof assistant.

In part II, we study the common structure of our weak memory proofs written in Agda. As those proofs are often large, complex, and rigid, we identify their common structures for which we identify domain-specific abstractions. We implement those abstractions in our novel Agda proof framework Burrow to greatly simplify writing future weak memory proofs.

In part III, we use dynamic analysis to identify weak behaviors that were never seen on x86 but could appear on Arm. Our analysis simulates the program’s execution with the formal weak memory semantics of x86 and Arm. This analysis identifies only the new behaviors the program shows in practice. After finding any new behavior on Arm, we judiciously modify the program to eliminate only that behavior.

Lasagne

A static binary translator for weak memory model architectures

Conference paper (2022) - Rodrigo C.O. Rocha, Dennis Sprokholt, Martin Fink, Redha Gouicem, Tom Spink, Soham Chakraborty, Pramod Bhatotia

The emergence of new architectures create a recurring challenge to ensure that existing programs still work on them. Manually porting legacy code is often impractical. Static binary translation (SBT) is a process where a program's binary is automatically translated from one architecture to another, while preserving their original semantics. However, these SBT tools have limited support to various advanced architectural features. Importantly, they are currently unable to translate concurrent binaries. The main challenge arises from the mismatches of the memory consistency model specified by the different architectures, especially when porting existing binaries to a weak memory model architecture. In this paper, we propose Lasagne, an end-to-end static binary translator with precise translation rules between x86 and Arm concurrency semantics. First, we propose a concurrency model for Lasagne's intermediate representation (IR) and formally proved mappings between the IR and the two architectures. The memory ordering is preserved by introducing fences in the translated code. Finally, we propose optimizations focused on raising the level of abstraction of memory address calculations and reducing the number of fences. Our evaluation shows that Lasagne reduces the number of fences by up to about 65%, with an average reduction of 45.5%, significantly reducing their runtime overhead. ...

Risotto

A Dynamic Binary Translator for Weak Memory Model Architectures

Conference paper (2022) - Redha Gouicem, Dennis Sprokholt, Jasper Ruehl, Rodrigo C.O. Rocha, Tom Spink, Soham Chakraborty, Pramod Bhatotia

Dynamic Binary Translation (DBT) is a powerful approach to support cross-architecture emulation of unmodified binaries. However, DBT systems face correctness and performance challenges, when emulating concurrent binaries from strong to weak memory consistency architectures. As a matter of fact, we report several translation errors in QEMU, when emulating x86 binaries on Arm hosts. To address these challenges, we propose an end-to-end approach that provides correct and efficient emulation for weak memory model architectures. Our contributions are twofold: we formalize QEMU's intermediate representation's memory model, and use it to propose formally verified mapping schemes to bridge the strong-on-weak memory consistency mismatch. Secondly, we implement these verified mappings in Risotto, a QEMU-based DBT system that optimizes memory fence placement while ensuring correctness. Risotto further enhances the emulation performance via cross-architecture dynamic linking of native shared libraries, and fast and correct translation of compare-and-swap operations. We evaluate Risotto using multi-threaded benchmark suites and real-world applications, and show that Risotto improves the emulation performance by 6.7% on average over "erroneous"QEMU, while ensuring correctness. ...