Low-Rank Ternary Adapters for Fine-Tuning Transformers
A.D. Manolache (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Yunqiang Li – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
Jan van Van Gemert – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
A. Anand – Graduation committee member (TU Delft - Web Information Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Parameter-Efficient Fine-Tuning (PEFT) methods for Transformers are designed for floating-point weights. When applied to extremely low-bit models (e.g., ternary {-1,0,1) they convert the base weights to floating point (dequantization) to add the update and then quantize again, which can diminish the benefits of aggressive quantization. We introduce a multiplicative ternary adapter that enables in-domain fine-tuning by applying an element-wise ternary mask to the base ternary Transformer weights, avoiding any dequantization to floating point and allowing direct merging back into the model. Constructed as the Kronecker product of two small trainable matrices and applied via a Hadamard product, the adapter preserves the ternary domain and merges with zero inference overhead. On a ternarized Llama-3.2-1B model, our method recovers substantial accuracy and surpasses stronger 2-bit baselines on most tasks, while retaining the efficiency advantages of ternary weights.
Files
File under embargo until 09-09-2026