C3

CXL Coherence Controllers for Heterogeneous Architectures

Conference Paper (2026)
Author(s)

Anatole Lefort (Technische Universität München)

David Schall (Technische Universität München)

Nicolò Carpentieri (Technische Universität München)

Julian Pritzi (Technische Universität München)

Soham Chakraborty (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Nicolai Oswald (NVIDIA Corporation)

Pramod Bhatotia (Technische Universität München)

Research Group
Programming Languages
DOI related publication
https://doi.org/10.1109/HPCA68181.2026.11408469 Final published version
More Info
expand_more
Publication Year
2026
Language
English
Research Group
Programming Languages
Publisher
IEEE
ISBN (electronic)
9798331593025
Event
32nd IEEE International Symposium on High-Performance Computer Architecture, HPCA 2026 (2026-01-31 - 2026-02-04), Sydney, Australia
Downloads counter
11
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

We introduce C3, a systematic methodology for designing Compute Express Link (CXL) coherence controllers, to overcome interoperability challenges that arise from the mismatch of coherence protocols and memory consistency models in heterogeneous CXL-connected systems. Crucially, CXL lacks a unified heterogeneous computing interface, which can lead to unpredictable and inconsistent behavior when multiple heterogeneous devices decide to share cache-coherent CXL memory. C3 acts as a pivotal interface between diverse heterogeneous compute units, bridging the semantic differences without necessitating disruptive changes to existing system architectures. Our approach hinges on two key principles: delegating memory operations across coherence domains and enforcing atomicity at domain boundaries, thereby preserving the native memory consistency model semantics of each unit. We implement C3 as a generic gem5 model and validate its correctness through exhaustive litmus testing. We also show that C3 incurs minimal performance overhead compared to unified native coherence protocols.

Files

Taverne
warning

File under embargo until 07-09-2026