LH

L.O. Hu

2 records found

The rapid growth of deep learning models, particularly Transformers, has far outpaced hardware scaling, increasing pressure on memory and compute efficiency. While INT8 quantization reduces memory requirements, it often sacrifices accuracy. Microscaling (MX) formats, such as MXIN ...
In the past decades, much progress has been made in the field of AI, and now many different algorithms exist that reach very high accuracies. Unfortunately, many of these algorithms are quite resource intensive, which makes them unavailable on low-cost devices.
The aim of th ...