Transformer Modules

None, None

Transformer Modules

Transferable & Parameter Efficient LLM Fine Tuning

Master Thesis (2024)

Author(s)

J.T. O'Dwyer Wha Binda (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Pradeep Kumar Murukannaiah – Mentor (TU Delft - Interactive Intelligence)

E. Liscio – Mentor (TU Delft - Interactive Intelligence)

J.C. van Gemert – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Natural Language Processing (NLP) Large Language Models (LLMs) Transformer Neural Network Parameter Efficient Fine Tuning (PEFT)

To reference this document use:

https://resolver.tudelft.nl/uuid:516b2d8d-3d74-4dc5-bf69-cbd2b230aff3

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

05-06-2024

Awarding Institution

Delft University of Technology

Programme

Computer Science

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

With the increasing popularity of Large Language Models (LLMs), fine-tuning them has become increasingly computationally expensive. Parameter Efficient Fine-Tuning (PEFT) methods like LoRA and Adapters, introduced by Microsoft and Google, respectively, aim to reduce the number of trainable parameters, with the current state-of-the-art combining both methods as LoRA Adapters. This paper introduces Transformer Modules as a PEFT method. These modules utilize Modular Transformer Blocks (MTBs) inserted into a frozen pre-trained model, achieving competitive performance while significantly reducing computation costs. Compared to the current state-of-the-art using GPT-2, BERT, and T5, Transformer Modules further reduced compute time by 39.7\% and training memory by 72.7\%, with a performance cost of 4.5±2.51\% on the GLUE benchmark. Additionally, the paper presents the Transformer Bridge, a continuous vector transformer designed to transfer Transformer Modules across different models. This could enable cross-model fine-tuning, allowing model-agnostic modules, such as an ethics or medical module, to be used across various LLMs without retraining or access to the original dataset. Although the current implementation of the Transformer Bridge did not fully succeed in mapping embedding spaces, analysis of the results suggests that further refinements using traditional model distillation techniques could lead to success in future iterations.

Files

Transformer_Modules_Jahson_ODw... (pdf)

(pdf | 0 Mb)

- Embargo expired in 30-06-2025

License info not available