YL

Y. Li

3 records found

Parameter-Efficient Fine-Tuning (PEFT) methods for Transformers are designed for floating-point weights. When applied to extremely low-bit models (e.g., ternary {-1,0,1) they convert the base weights to floating point (dequantization) to add the update and then quantize again, wh ...
Model compression techniques are crucial for reducing the deployment cost of large neural networks. Among these, depth pruning (removing layers/blocks) and width pruning (removing sections within layers) are essential for reducing memory footprint and inference latency. While var ...
Binary Neural Networks (BNNs) are compact and efficient by using binary weights instead of real-valued weights. Current BNNs use latent real-valued weights during training, where several training hyper-parameters are inherited from real-valued networks. The interpretation of seve ...