A Comparative Study of Fine-Tuning Pipelines for Integrating Large Language Models in Multimodal Data Analysis

None, None

A Comparative Study of Fine-Tuning Pipelines for Integrating Large Language Models in Multimodal Data Analysis

Bachelor Thesis (2024)

Author(s)

C. Grîu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Kubilay Atasu – Mentor (TU Delft - Data-Intensive Systems)

T.A. Akyıldız – Mentor (TU Delft - Data-Intensive Systems)

Burcu Külahçıoğlu Kulahcioglu Ozkan – Graduation committee member (TU Delft - Software Engineering)

Faculty

Electrical Engineering, Mathematics and Computer Science

Multimodal LLM FT-Transformer Fine-tuning

To reference this document use:

https://resolver.tudelft.nl/uuid:cc1bbdfb-2ef7-489b-84c4-b25cf2bb312c

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

27-06-2024

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

While LLMs are proficient in processing textual information, integrating them with other models presents significant challenges.
This study evaluates the effectiveness of various configurations for integrating a large language model (LLM) with models capable of handling multimodal data.\\

We explore the advantages of using pre-trained LLMs for generating text embeddings and the benefits of fine-tuning LLMs for specific tasks. Our investigation includes various fine-tuning strategies, such as Low-Rank Adaptation (LoRA), prompt tuning, and full fine-tuning, applied to both smaller and larger language models. Additionally, we analyze different training setups, including sequential and cascaded training of LLMs and downstream architectures. Our comparative analysis evaluates the performance and cost-effectiveness of these methods. The findings indicate that while full fine-tuning achieves the best results, LoRA offers a practical balance between computational efficiency and model performance. We also highlight the correlation between increased LLM size and corresponding increases in cost and performance.

Files

Bachelor_Thesis_Catalin_Griu.p... (pdf)

(pdf | 0.519 Mb)

License info not available