MXITA: Design and Implementation of Microscaling Integer Accelerator for Neural Networks

None, None

MXITA: Design and Implementation of Microscaling Integer Accelerator for Neural Networks

An exploration of multidimensional systolic arrays

Master Thesis (2025)

Author(s)

L.O. Hu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

G. Gaydadjiev – Mentor (TU Delft - Computer Engineering)

G. Islamoglu – Mentor (ETH Zürich)

P. Wiese – Mentor (ETH Zürich)

L Benini – Mentor (ETH Zürich)

J.S.S.M. Wong – Graduation committee member (TU Delft - Computer Engineering)

C. Frenkel – Graduation committee member (TU Delft - Electronic Instrumentation)

Faculty

Electrical Engineering, Mathematics and Computer Science

Machine learning Neural network Accelerator Systolic array Microscaling

To reference this document use:

https://resolver.tudelft.nl/uuid:63d4cfc6-2a67-4e97-a2f7-3594a4d74475

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

23-09-2025

Awarding Institution

Delft University of Technology

Programme

['Electrical Engineering | Embedded Systems']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The rapid growth of deep learning models, particularly Transformers, has far outpaced hardware scaling, increasing pressure on memory and compute efficiency. While INT8 quantization reduces memory requirements, it often sacrifices accuracy. Microscaling (MX) formats, such as MXINT8, address this trade-off by grouping INT8 values with a shared exponent, achieving FP32-level accuracy with up to 4 times memory savings. However, efficient execution of mixed integer–floating-point operations requires specialized hardware. Prior MX accelerators based on systolic arrays are limited by underutilized processing elements or the overhead of FP32 peripheries.

This work presents MXITA, a multi-dimensional systolic array accelerator for MX matrix multiplications in neural network workloads. The architecture introduces parameterization over (M, N, P, Q), enabling trade-offs between supported MX block sizes and FP32 peripheral reuse while sustaining high throughput. MXITA was designed, implemented, and integrated into the Snitch cluster, with verification ensuring functional correctness at both module and system levels.

Synthesis in GF22 technology demonstrates that MXITA achieves higher area efficiency than prior state-of-the-art MX accelerators by amortizing FP32 hardware across compute tiles and reducing periphery overhead. These results highlight the potential of multidimensional systolic arrays as scalable and efficient hardware for MX quantized deep learning workloads.

Files

Li_Ou_Hu_MSc_Thesis_TUD.pdf

(pdf | 11.7 Mb)

License info not available