MXITA: Design and Implementation of Microscaling Integer Accelerator for Neural Networks

An exploration of multidimensional systolic arrays

Master Thesis (2025)
Author(s)

L.O. Hu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

G. Gaydadjiev – Mentor (TU Delft - Computer Engineering)

G. Islamoglu – Mentor (ETH Zürich)

P. Wiese – Mentor (ETH Zürich)

L. Benini – Mentor (ETH Zürich)

JSSM Wong – Graduation committee member (TU Delft - Computer Engineering)

C. Frenkel – Graduation committee member (TU Delft - Electronic Instrumentation)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
23-09-2025
Awarding Institution
Delft University of Technology
Programme
['Electrical Engineering | Embedded Systems']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The rapid growth of deep learning models, particularly Transformers, has far outpaced hardware scaling, increasing pressure on memory and compute efficiency. While INT8 quantization reduces memory requirements, it often sacrifices accuracy. Microscaling (MX) formats, such as MXINT8, address this trade-off by grouping INT8 values with a shared exponent, achieving FP32-level accuracy with up to 4 times memory savings. However, efficient execution of mixed integer–floating-point operations requires specialized hardware. Prior MX accelerators based on systolic arrays are limited by underutilized processing elements or the overhead of FP32 peripheries.

This work presents MXITA, a multi-dimensional systolic array accelerator for MX matrix multiplications in neural network workloads. The architecture introduces parameterization over (M, N, P, Q), enabling trade-offs between supported MX block sizes and FP32 peripheral reuse while sustaining high throughput. MXITA was designed, implemented, and integrated into the Snitch cluster, with verification ensuring functional correctness at both module and system levels.

Synthesis in GF22 technology demonstrates that MXITA achieves higher area efficiency than prior state-of-the-art MX accelerators by amortizing FP32 hardware across compute tiles and reducing periphery overhead. These results highlight the potential of multidimensional systolic arrays as scalable and efficient hardware for MX quantized deep learning workloads.

Files

Li_Ou_Hu_MSc_Thesis_TUD.pdf
(pdf | 11.7 Mb)
License info not available