Low-Rank Ternary Adapters for Fine-Tuning Transformers

None, None

Low-Rank Ternary Adapters for Fine-Tuning Transformers

Master Thesis (2025)

Author(s)

A.D. Manolache (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Yunqiang Li – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Jan van Van Gemert – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

A. Anand – Graduation committee member (TU Delft - Web Information Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Parameter-Efficient Fine-Tuning Ternary Transformers 1.58-bit Quantization Multiplicative Low-Rank Adapters

To reference this document use:

https://resolver.tudelft.nl/uuid:385dff3e-47cb-4060-ab35-93bdda88f5d3

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

09-09-2025

Awarding Institution

Delft University of Technology

Programme

['Computer Science']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) methods for Transformers are designed for floating-point weights. When applied to extremely low-bit models (e.g., ternary {-1,0,1) they convert the base weights to floating point (dequantization) to add the update and then quantize again, which can diminish the benefits of aggressive quantization. We introduce a multiplicative ternary adapter that enables in-domain fine-tuning by applying an element-wise ternary mask to the base ternary Transformer weights, avoiding any dequantization to floating point and allowing direct merging back into the model. Constructed as the Kronecker product of two small trainable matrices and applied via a Hadamard product, the adapter preserves the ternary domain and merges with zero inference overhead. On a ternarized Llama-3.2-1B model, our method recovers substantial accuracy and surpasses stronger 2-bit baselines on most tasks, while retaining the efficiency advantages of ternary weights.

Files

MSc_Thesis_Alex_Manolache.pdf

(pdf | 0 Mb)

License info not available

File under embargo until 09-09-2026