T.A. Akyıldız | TU Delft Repository

How to improve the performance of the fused architecture consisting of a tabular transformer and a graph neural network used for representation learning for multimodal data?

Bachelor thesis (2024) - D.D. Drashkov, Kubilay Atasu, T.A. Akyıldız, B. Özkan

The substantial amount of tabular data can be attributed to its storage convenience. There is a high demand for learning useful information from the data. To achieve that, machine learning models, called transformers, have been created. They can find patterns in the data, learn from them, and improve their predictive abilities based on that learning experience. There are also tabular transformers for tabular data. In order to attempt to increase the predictive performance of the transformers, we have combined them with graph neural networks (GNNs), which are again machine learning models, which work on graph data by learning information from the nodes and the edges. A graph representation of the dataset is created and input into the graph neural network. The architecture that fuses these two machine learning models is a more complex machine learning model that combines the transformer and the GNN. The aim is to increase the predictive ability of the model for values from the table or to predict whether an edge in the graph exists, which represents whether a transaction between two users exists. We have built the architecture using certain types of a tabular transformer and a graph neural network, FT-Transformer and GINe respectively, and the next step is to try modifying this architecture by using different models, and different ways of using these layers, for example how many copies we are creating of it. This has the potential to be a versatile model than can be used for different kinds of datasets. We have seen notable improvement in performance when using a different GNN, PNA. The transformer ResNet also shows to be on a similar or slightly better performing level than FT-Transformer when not combined with a GNN. GraphSage in the fused model underperforms significantly due to its weakness to capture simple graph structures. ...

Applying Fine-Tuning methods to FTTransformer in Anti Money Laundering applications

Bachelor thesis (2024) - V.P. de Graaff, Kubilay Atasu, T.A. Akyıldız

This research investigates the effectiveness of combining Feature Tokenizer Transformer (FTTransformer)[6] with graph neural networks for anti-money laundering (AML) applications. We explore various fine-tuning techniques, including LoRA[9] and vanilla fine-tuning, on our baseline FTT architecture. Using the IBM AML dataset [1], we compare the performance of different models and fine-tuning approaches. Our results indicate that FTT alone do not outperform GNN’s and careful configuration is required when working with datasets of Multi-Modality. This work contributes to the development of more efficient and accurate methods for detecting financial fraud patterns. ...

A Comparative Study of Fine-Tuning Pipelines for Integrating Large Language Models in Multimodal Data Analysis

Bachelor thesis (2024) - C. Grîu, Kubilay Atasu, T.A. Akyıldız, Burcu Özkan

While LLMs are proficient in processing textual information, integrating them with other models presents significant challenges.
This study evaluates the effectiveness of various configurations for integrating a large language model (LLM) with models capable of handling multimodal data.\\

We explore the advantages of using pre-trained LLMs for generating text embeddings and the benefits of fine-tuning LLMs for specific tasks. Our investigation includes various fine-tuning strategies, such as Low-Rank Adaptation (LoRA), prompt tuning, and full fine-tuning, applied to both smaller and larger language models. Additionally, we analyze different training setups, including sequential and cascaded training of LLMs and downstream architectures. Our comparative analysis evaluates the performance and cost-effectiveness of these methods. The findings indicate that while full fine-tuning achieves the best results, LoRA offers a practical balance between computational efficiency and model performance. We also highlight the correlation between increased LLM size and corresponding increases in cost and performance. ...

Self-Supervised Representation Learning for Relational Multimodal Data

Should we combine multiple pretext tasks?

Bachelor thesis (2024) - I. Mc Auliffe, Kubilay Atasu, T.A. Akyıldız, B. Özkan

Deep Learning models can use pretext tasks to learn representations on unlabelled datasets. Although there have been several works on representation learning and pre-training, to the best of our knowledge combining pretext tasks in a multi-task setting for relational multimodal data has not been done before. In this work, we implemented 4 pretext tasks on top of a framework for handling relational multi-modal data and evaluated them based on two datasets. We first identified the best-performing masking strategy for pretext tasks that require masking. Then, we compared different combinations of the pretext tasks based on self-supervised metrics as a proxy for the quality of the representation learned. The results reveal that masking values by replacing from the column's empirical distribution yields 4.6\% and 4\% higher accuracy on each dataset respectively than replacing them with a fixed value. We also found that different combinations of pretext tasks, even with different numbers of tasks, converge to marginally different values and MoCo further reduces this difference. Our findings imply that the number of pretext tasks can scale efficiently allowing for a more diverse representation to be learned. ...