A Comparative Study of Fine-Tuning Pipelines for Integrating Large Language Models in Multimodal Data Analysis
C. Grîu (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Kubilay Atasu – Mentor (TU Delft - Data-Intensive Systems)
T.A. Akyıldız – Mentor (TU Delft - Data-Intensive Systems)
Burcu Külahçıoğlu Kulahcioglu Ozkan – Graduation committee member (TU Delft - Software Engineering)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
While LLMs are proficient in processing textual information, integrating them with other models presents significant challenges.
This study evaluates the effectiveness of various configurations for integrating a large language model (LLM) with models capable of handling multimodal data.\\
We explore the advantages of using pre-trained LLMs for generating text embeddings and the benefits of fine-tuning LLMs for specific tasks. Our investigation includes various fine-tuning strategies, such as Low-Rank Adaptation (LoRA), prompt tuning, and full fine-tuning, applied to both smaller and larger language models. Additionally, we analyze different training setups, including sequential and cascaded training of LLMs and downstream architectures. Our comparative analysis evaluates the performance and cost-effectiveness of these methods. The findings indicate that while full fine-tuning achieves the best results, LoRA offers a practical balance between computational efficiency and model performance. We also highlight the correlation between increased LLM size and corresponding increases in cost and performance.