LB

L. Benini

1 records found

The rapid growth of deep learning models, particularly Transformers, has far outpaced hardware scaling, increasing pressure on memory and compute efficiency. While INT8 quantization reduces memory requirements, it often sacrifices accuracy. Microscaling (MX) formats, such as MXIN ...