Kubilay Atasu | TU Delft Repository

Adversarial Robustness of Multigraph Neural Networks

Master thesis (2026) - D. Heijmans, Kubilay Atasu, H.Ç. Bilgi, R. Wang, J.A. Pouwelse, Z. Erkin

Detecting money laundering in financial transaction data is a task where graph neural networks (GNNs) have shown strong potential. Such data is naturally represented as a directed multigraph, since two accounts, each represented as a node, may exchange many separate payments, each forming a distinct edge with its own amount, currency, and timestamp. Preserving these parallel edges, rather than collapsing them into a single connection, retains the fine-grained structure that allows for distinguishing laundering behaviour from ordinary activity. Yet these models also introduce a new vulnerability, as an adversary could manipulate the transaction graph to alter the neighbourhood of a suspicious account such that the GNN misclassifies it as benign. Existing adversarial robustness research operates on the adjacency matrix, which records at most one edge per node pair and therefore cannot represent the parallel transactions between two accounts that this task depends on. Multigraph GNNs therefore lack both a framework for evaluating robustness under structural perturbations and defences against such perturbations.

This thesis extends adversarial robustness analysis to multigraph GNNs through three contributions. First, it reformulates GNN message passing and attack optimisation over the incidence matrix instead of the adjacency matrix, yielding the first gradient-based structural attack that retains multi-edge structure. Second, it introduces unnoticeability loss terms that constrain perturbations to maintain the graph's statistical fingerprint, including the frequency of characteristic patterns such as short transaction cycles, keeping the attack statistically plausible and unnoticeable at the macro level. Third, it scales the framework to large networks with projected randomised block coordinate descent. On the IBM synthetic anti-money laundering dataset, learned attacks substantially reduce detection accuracy compared to non-learnable perturbations, and adversarial training recovers robustness, showing that multigraph GNNs are both vulnerable to structural manipulation and defensible against it. ...

Detecting money laundering in financial transaction data is a task where graph neural networks (GNNs) have shown strong potential. Such data is naturally represented as a directed multigraph, since two accounts, each represented as a node, may exchange many separate payments, each forming a distinct edge with its own amount, currency, and timestamp. Preserving these parallel edges, rather than collapsing them into a single connection, retains the fine-grained structure that allows for distinguishing laundering behaviour from ordinary activity. Yet these models also introduce a new vulnerability, as an adversary could manipulate the transaction graph to alter the neighbourhood of a suspicious account such that the GNN misclassifies it as benign. Existing adversarial robustness research operates on the adjacency matrix, which records at most one edge per node pair and therefore cannot represent the parallel transactions between two accounts that this task depends on. Multigraph GNNs therefore lack both a framework for evaluating robustness under structural perturbations and defences against such perturbations.

This thesis extends adversarial robustness analysis to multigraph GNNs through three contributions. First, it reformulates GNN message passing and attack optimisation over the incidence matrix instead of the adjacency matrix, yielding the first gradient-based structural attack that retains multi-edge structure. Second, it introduces unnoticeability loss terms that constrain perturbations to maintain the graph's statistical fingerprint, including the frequency of characteristic patterns such as short transaction cycles, keeping the attack statistically plausible and unnoticeable at the macro level. Third, it scales the framework to large networks with projected randomised block coordinate descent. On the IBM synthetic anti-money laundering dataset, learned attacks substantially reduce detection accuracy compared to non-learnable perturbations, and adversarial training recovers robustness, showing that multigraph GNNs are both vulnerable to structural manipulation and defensible against it.

Layer-Wise Exchange for Subgraph Federated Learning

An Application to Financial Crime Detection

Master thesis (2026) - S. Ceydeli, Kubilay Atasu, Rui Wang, Zeki Erkin, Burcu Özkan

Subgraph pattern detection aims to uncover complex interaction structures, for example those associated with money laundering in financial transaction networks. State-of-the-art graph neural network (GNN) solutions, however, assume centralized access to the entire graph. When the graph is instead distributed across multiple financial institutions, each client computes node representations using only its own subgraph, so client-local GNN computations diverge from those of a centralized model. We formalize this divergence as the structural observability problem, in which subgraph patterns crossing partition boundaries become locally unidentifiable. This divergence manifests as both a forward gap in the node representations and a backward gap in the training-time adjoint signal. To close both gaps, we propose a per-step, layer-wise exchange framework with two complementary components: a forward exchange that synchronizes node representations at every layer of the forward pass, and a backward exchange that synchronizes the corresponding gradient signals at every layer of the backward pass; neither component exposes raw features or labels. Under an extended subgraph assumption and shared model parameters across clients, we prove that the forward exchange recovers the representations a centralized GNN would compute over the full graph (representation equivalence) and the backward exchange makes the per-client parameter gradients sum to the exact centralized gradient (gradient equivalence). Together, forward and backward exchange make federated training equivalent to centralized training. Experiments on synthetic directed multigraphs with cycle, biclique, and scatter-gather patterns show that forward exchange and federated parameter aggregation are complementary rather than interchangeable, and that their combination recovers most of the gap to centralized performance. This recovery depends on per-step freshness, with stale per-epoch exchange leaving a measurable residual. Adding backward exchange yields further improvements, with the largest gains achieved when the cross-client connectivity is densest. ...

Subgraph pattern detection aims to uncover complex interaction structures, for example those associated with money laundering in financial transaction networks. State-of-the-art graph neural network (GNN) solutions, however, assume centralized access to the entire graph. When the graph is instead distributed across multiple financial institutions, each client computes node representations using only its own subgraph, so client-local GNN computations diverge from those of a centralized model. We formalize this divergence as the structural observability problem, in which subgraph patterns crossing partition boundaries become locally unidentifiable. This divergence manifests as both a forward gap in the node representations and a backward gap in the training-time adjoint signal. To close both gaps, we propose a per-step, layer-wise exchange framework with two complementary components: a forward exchange that synchronizes node representations at every layer of the forward pass, and a backward exchange that synchronizes the corresponding gradient signals at every layer of the backward pass; neither component exposes raw features or labels. Under an extended subgraph assumption and shared model parameters across clients, we prove that the forward exchange recovers the representations a centralized GNN would compute over the full graph (representation equivalence) and the backward exchange makes the per-client parameter gradients sum to the exact centralized gradient (gradient equivalence). Together, forward and backward exchange make federated training equivalent to centralized training. Experiments on synthetic directed multigraphs with cycle, biclique, and scatter-gather patterns show that forward exchange and federated parameter aggregation are complementary rather than interchangeable, and that their combination recovers most of the gap to centralized performance. This recovery depends on per-step freshness, with stale per-epoch exchange leaving a measurable residual. Adding backward exchange yields further improvements, with the largest gains achieved when the cross-client connectivity is densest.

Time Series Foundation Models for Operational Support in Geothermal Systems

Bridging the Gap between Advanced AI and Energy Infrastructure

Master thesis (2026) - Z.M. Alam, Kubilay Atasu, Pejman Shoeibi Omrani, A. Anand, Jérémie Decouchant

Geothermal energy plays an increasingly important role in decarbonizing heating, cooling, and power production. As geothermal systems operate under extreme temperatures, pressures, and subsurface uncertainties, maintaining reliable operation is critical to sustaining a continuous energy supply and reducing the total cost of ownership. Ensuring the safe and efficient operation of geothermal plants therefore requires continuous monitoring of complex, multivariate sensor streams to detect equipment degradation and anticipate operational failures before they occur. This often relies on separate specialized physics-based and machine learning models for each task, with sparse labels and inter-site variability limiting generalization.

In this work, we explore the application of state-of-the-art Time Series Foundation Models (TSFMs) as a unified alternative for both forecasting and anomaly detection in geothermal operations. We present a geothermal-specific benchmark for time series modeling and conduct a systematic comparative evaluation of conventional machine and deep learning baselines against pretrained TSFMs under zero-shot conditions. The results demonstrate that, in forecasting tasks, covariate-aware TSFMs, particularly Chronos, consistently outperform all trained baselines, achieving 22–35% lower RMSE across horizons. For anomaly detection, we evaluate multiple detection strategies and find that performance is influenced more strongly by the choice of detection strategy and the availability of labeled fault data than by forecasting accuracy alone, with TSFM embeddings consistently encoding system information and enabling effective anomaly detection under labeled conditions.

These findings establish TSFMs as a promising foundation for intelligent condition monitoring in geothermal and broader industrial time series applications, while highlighting the importance of explicit covariate modeling for this class of systems. ...

Geothermal energy plays an increasingly important role in decarbonizing heating, cooling, and power production. As geothermal systems operate under extreme temperatures, pressures, and subsurface uncertainties, maintaining reliable operation is critical to sustaining a continuous energy supply and reducing the total cost of ownership. Ensuring the safe and efficient operation of geothermal plants therefore requires continuous monitoring of complex, multivariate sensor streams to detect equipment degradation and anticipate operational failures before they occur. This often relies on separate specialized physics-based and machine learning models for each task, with sparse labels and inter-site variability limiting generalization.

In this work, we explore the application of state-of-the-art Time Series Foundation Models (TSFMs) as a unified alternative for both forecasting and anomaly detection in geothermal operations. We present a geothermal-specific benchmark for time series modeling and conduct a systematic comparative evaluation of conventional machine and deep learning baselines against pretrained TSFMs under zero-shot conditions. The results demonstrate that, in forecasting tasks, covariate-aware TSFMs, particularly Chronos, consistently outperform all trained baselines, achieving 22–35% lower RMSE across horizons. For anomaly detection, we evaluate multiple detection strategies and find that performance is influenced more strongly by the choice of detection strategy and the availability of labeled fault data than by forecasting accuracy alone, with TSFM embeddings consistently encoding system information and enabling effective anomaly detection under labeled conditions.

These findings establish TSFMs as a promising foundation for intelligent condition monitoring in geothermal and broader industrial time series applications, while highlighting the importance of explicit covariate modeling for this class of systems.

Understanding Memorization in Large Language Models

What controls memorization rate? From entropy to conditional entropy or conditioning structure

Master thesis (2026) - R. Alvarez Lucendo, Kubilay Atasu, J.C. van Gemert, Jérémie Decouchant, Madhur Panwar

Large language models (LLMs) can reproduce passages from their training data verbatim, raising privacy and copyright concerns. Prior work attributes memorization to factors such as model size, sequence entropy, context length, and repetition, but these findings lack a unified explanation. This thesis proposes a disambiguation complexity framework: memorization speed is governed not by the information content of a sequence, but by the difficulty of identifying it, specifically by the complexity of the minimal conditioning structure the model must extract from context to uniquely determine the correct continuation.

We demonstrate a counterintuitive regime in which random token sequences are memorized faster than structured natural language, contradicting standard explanations. We formalize a hierarchy of conditioning levels and introduce K-arity, a scalar complexity measure counting the number of prefix tokens jointly required to make a continuation deterministic. Through controlled experiments on synthetic datasets, we show that conditioning level and K-arity are predictive of memorization behavior. Attention analysis reveals that disambiguating cues are most clearly visible in early attention patterns. Natural language experiments show that, in text rich with redundant linguistic cues, isolated manipulations of conditioning complexity do not produce detectable differences, highlighting the gap between synthetic and naturalistic settings. This single principle connects input representation, entropy, identifying tokens, and context length within a common theoretical lens. ...

Relational Deep Learning with Graph Transformers: Exploring Local and Global Message Passing

Bachelor thesis (2025) - I. Cuñado, H.Ç. Bilgi, Kubilay Atasu, T. Höllt

Graph Transformers have played a key role in the latest graph learning developments. However, their application and performance in Relational Deep Learning (RDL), which has huge potential to remove inefficient data pre-processing pipelines, remain largely unexplored. For this reason, we present adaptations to two well-known Graph Transformer models: a relation-aware local message passing variant (FraudGT) that computes separate attention matrices for each edge and node type; and a simplified global-attention version that ignores heterogeneity (Graphormer). Our analysis demonstrates that local relation-aware attention achieves state-of-the-art results on node classification and regression tasks when evaluated against RelBench tasks, a set of comprehensive RDL benchmarks. We show how local message passing is computationally cheaper, faster, more efficient and more accurate than global attention. Our code is available at https://github.com/ignaciocunado/gt-rdl. ...

A new way of cooperative cycle detection against financial crime

Decentralised cycle detection using cross-institutional transactions

Bachelor thesis (2025) - Z. Beijer, Z. Erkin, L.E. Touwen, Kubilay Atasu

The act of masking the origin of illegal funds, to inject them into the economy in seemingly legal manners is called money laundering. Adversaries make use of money laundering to stay undetected when using illegally obtained money, from stealing, fraud, or other criminal activities. These money laundering processes often span multiple institutions or countries. To combat this, anti money laundering systems have evolved, also known as AMLs. AMLs have gone from simple rule-based approaches to using machine learning to analyse money transfer graphs. However, many money laundering operations still go undetected, particularly due to the assumption that all transaction data is centrally accessible. Yet in practice, institutions are not able to share their data with others because of privacy regulations and concerns. This restricts AMLs when deployed in a decentralised setting. This paper presents the first step towards an algorithm that allows two institutions to detect cycles between them, without these institutions exposing their own subgraphs to the other. The method uses a depth first search algorithm to find associated border vertices, then applies data reduction techniques to minimize the data shared between institutions. These border vertices are then compared to infer the presence of a cycle. While not yet deployable in real-world settings, the algorithm demonstrates improved communication and computational complexity over existing solutions and lays the groundwork for future privacy-preserving AML tools. ...

Graph Learning on Financial Tabular Data

Cascade and Interleaved architectures using GNNs and Transformers

Bachelor thesis (2025) - S. Enachioiu, Kubilay Atasu, H.Ç. Bilgi, T. Höllt

Detecting money-laundering activity in financial transactions is challenging due to the multigraph nature of the problem as well as the intricate fraud patterns that exist. In this work we introduce two architectures, Cascade and Interleaved. These architectures combine the expressive power of local message passing (MP) from Graph Neural Networks (GNNs) with the one of global message passing from Transformers. Both models leverage the Principal Neighborhood Aggregation (PNA) GNN for capturing rich local structure. We also incorporate the MEGA two-stage aggregation scheme to distinguish transactions that have the same source and destination accounts from other transactions. We further enhance our architectures with PEARL, a learnable positional encoding framework that has a reduced overhead compared to other techniques. We evaluate our models on the IBM transactions for Anti-Money Laundering (AML) synthetic datasets. We achieve significant improvements compared to the PNA baseline, and come close to tie SOTA results, while requiring less feature engineering on the input graphs and also show that the application of learnable positional encodings in financial fraud detection tasks is promising. ...

Graph Learning on Tabular Data: Think Global And Local

Full Fusion and Interleaved architectures on IBM’s Anti-Money Laundering Data

Bachelor thesis (2025) - A. Stefan, Kubilay Atasu, H.Ç. Bilgi, T. Höllt

As financial fraud becomes increasingly sophisticated, traditional detection methods struggle to uncover the complex relational patterns underlying illicit behavior. This paper investigates the effectiveness of combining Graph Neural Networks (GNNs) and Transformers for fraud detection on relational data transformed into graph structures. Focusing on the IBM Anti-Money Laundering (AML) dataset, two hybrid architectures are proposed: Interleaved, which alternates between GNNs and Transformers to exploit local and global information sequentially, and Full-Fusion, which fuses parallel GNN and Transformer representations at both feature and decision levels. The results show that integrating Transformers significantly boosts performance over standalone GNN baselines, with improvements up to 10% in the F1 score in small-scale datasets. It is also demonstrated that gating-based fusion strategies enhance model stability and accuracy, and further, that PEARL-based positional encodings do not result in any conclusive improvement of the models. These findings highlight the value of combining local message passing and global attention mechanisms for structured financial anomaly detection, and pave the way for more robust, adaptable graph-based solutions in fraud analytics and more. ...

The Impact of Realistic Laundering Subgraph Perturbations on Graph Neural Network Based Anti-Money Laundering Systems

Trustworthy Financial Crime Analytics

Bachelor thesis (2025) - T.J. Clark, Kubilay Atasu, Z. Erkin, M. Khosla

As financial institutions adopt more sophisticated Anti-Money Laundering (AML) techniques, such as the deployment of Graph Neural Networks (GNNs) to detect patterns, laundering behavior is likely to evolve. In this paper, we present a novel perturbation framework that models laundering as an evasion-based, restricted black-box process. Our tool systematically alters labeled laundering subgraphs through a set of parameterized graph actions (intermediary injection, merging, and splitting) designed to simulate realistic laundering adaptations. We apply our framework to one of the AMLWorld synthetic transaction datasets to generate multiple perturbed versions defined by a set of parameterized preset configuration files. We then evaluate the impact of these perturbations on two MEGA-GNN variants of the current state-of-the-art in temporal multigraph-compatible GNN architectures. Our results show that realistic structural perturbations can impact performance and serve as a valuable tool to evaluate model adaptability and robustness. Our work aims to contribute to a deeper understanding of the evolutionary dynamics between AML systems and laundering behavior. ...

Secure computation of fan-in and fan-out degree of nodes using additive homomorphic encryption

Bachelor thesis (2025) - D.E. Floroiu, Z. Erkin, Kubilay Atasu, M. Khosla

There is an increasing need for financial institutions to be able to detect illicit activities such as money laundering. While these institutions currently rely on graph-based analytics or machine learning algorithms for such detection, inter-bank collaboration is hindered by privacy concerns and regulations. In this paper, we introduce a new protocol for computing simple fundamental graph features (specifically fan-in and fan-out degrees) directly on encrypted transaction data using the advantages of additive homomorphic encryption schemes, especially the Paillier cryptosystem. Our algorithm allows a semi-trusted third party to perform computations without accessing plaintext data, enabling privacy-preserving collaboration between banks. Through the paper, we detail the protocol design, analyze its complexity, security and correctness, and demonstrate how it reduces the gap between utility and privacy. While the protocol currently supports only basic graph metrics and assumes a common normalized currency, it offers a scalable and practical foundation for future privacy-preserving financial crime analytics. ...

Collaborative Detection of Malicious Clients for Financial Institutions using Multi-Party Computation

Trustworthy Financial Crime Analytics

Bachelor thesis (2025) - L.H. de Hoop, Z. Erkin, Kubilay Atasu, L.E. Touwen, M. Khosla

Financial institutions have a large responsibility when it comes to detecting and preventing financial crime. However, dedicated tools to aid in financial crime detection have more demand than supply. The combination of regulatory restrictions with regards to sharing client information between financial institutions and a lack of dedicated tools for financial crime detection results in a flawed system that allows criminals to evade detection and easily continue their activities by moving between institutions. This paper answers the question: How can privacy-preserving data sharing methods enable collaborative detection of malicious clients among financial institutions? Multi-Party Private Set Intersection (MPSI) allows multiple parties to intersect their respective datasets, without revealing any data to the other parties that are not in the intersection. A special case of MPSI is Threshold Multi-Party Private Set Intersection (T-MPSI), where given a threshold T, an item is only included if T or more parties hold that item. This paper implements a new version, Flagged Threshold Private Set Intersection (FT-MPSI), that adds a label to each item, where the label indicates if the client has been flagged as malicious - accused or convicted of financial crime. To be included in the intersection, the item must now also be identified by at least one party as malicious. The final result of the intersection is revealed to the computing party and can be shared with the parties holding the original items while no other information is leaked. The runtime performance of the FT-MPSI protocol is compared to that of the T-MPSI protocol. FT-MPSI is slower by a constant factor of approximately 2, compared to T-MPSI, it scales linearly to the number of parties and size of the sets of the input. FT-MPSI is a practical solution for financial institutions to use in financial crime detection. ...

How money flow statistics can be used to detect money laundering activity in graph-based financial crime detection

Bachelor thesis (2025) - L.S. Ionescu, Z. Erkin, Kubilay Atasu, M. Khosla

Financial crime represents a growing issue which contemporary society is facing, especially in the form of money laundering, which aims to conceal the origin of illicit funds through a network of intermediate transactions. State of the art solutions for detection of money laundering in graph representations of financial data include supervised techniques such as Graph Neural Networks. Although provably efficient, their main limitation stands in the fact that they require a dataset of correctly and completely labeled transactions, which is often unfeasible to obtain. This work aims to explore money flow statistics as an unsupervised approach to money laundering detection, through computing statistics of accounts based on the amount of money received and sent in a certain time frame or network flow analysis using maximum flow algorithms. Therefore, this paper aims to answer two questions, namely What are the existing solutions using money flow statistics? and How would these money flow statistics methods perform on a realistic dataset of transactions?. The analysis benchmarks the identified algorithms on a realistic dataset of financial transactions in order to observe their limitations and suggest further research into how these limitations can be overcome in order to make money flow statistics methods a feasible solution to money laundering detection. ...

Hybrid Graph Representation Learning for Money Laundering Detection

Bachelor thesis (2025) - M. Frija, Kubilay Atasu, H.Ç. Bilgi, T. Höllt

Money laundering detection stands as one of the most important challenges in the anti-financial crime sector, given its grave repercussions on the financial industry. The evolving nature of fraud schemes and the increasing volume of financial transactions impose limitations on the detection capabilities of traditional anti-money laundering (AML) systems. In the light of the recent breakthroughs in the field of graph machine learning, graph neural networks (GNNs) and graph transformers (GTs) have emerged as prominent solutions to these limitations, achieving a remarkable performance in detecting complex and broad fraudulent patterns. However, fusing the powerful characteristics of these classes of graph models into a unified framework for fraud detection has been little explored. In this paper, we address this gap by presenting GraphFuse — a hybrid graph representation learning model tailored for money laundering detection in financial transaction graphs. The novel edge centrality and transaction signature encodings offer GraphFuse a slight advantage over the best-performing GNN and GT models, improving upon the best GT baseline by 0.76 p.p. in F1 score. Additionally, we introduce three variants of the Transformer-based component of GraphFuse, each with a different level of computational complexity. The competitive performance of Graph-Fuse is supported by extensive experiments on open-source, large-scale synthetic financial transactions datasets. Our code is available at https://github.com/mfrija/aml-graphfuse. ...

A Comparative Study of Fine-Tuning Pipelines for Integrating Large Language Models in Multimodal Data Analysis

Bachelor thesis (2024) - C. Grîu, Kubilay Atasu, T.A. Akyıldız, Burcu Özkan

While LLMs are proficient in processing textual information, integrating them with other models presents significant challenges.
This study evaluates the effectiveness of various configurations for integrating a large language model (LLM) with models capable of handling multimodal data.\\

We explore the advantages of using pre-trained LLMs for generating text embeddings and the benefits of fine-tuning LLMs for specific tasks. Our investigation includes various fine-tuning strategies, such as Low-Rank Adaptation (LoRA), prompt tuning, and full fine-tuning, applied to both smaller and larger language models. Additionally, we analyze different training setups, including sequential and cascaded training of LLMs and downstream architectures. Our comparative analysis evaluates the performance and cost-effectiveness of these methods. The findings indicate that while full fine-tuning achieves the best results, LoRA offers a practical balance between computational efficiency and model performance. We also highlight the correlation between increased LLM size and corresponding increases in cost and performance. ...

Self-Supervised Representation Learning for Relational Multimodal Data

Should we combine multiple pretext tasks?

Bachelor thesis (2024) - I. Mc Auliffe, Kubilay Atasu, T.A. Akyıldız, B. Özkan

Deep Learning models can use pretext tasks to learn representations on unlabelled datasets. Although there have been several works on representation learning and pre-training, to the best of our knowledge combining pretext tasks in a multi-task setting for relational multimodal data has not been done before. In this work, we implemented 4 pretext tasks on top of a framework for handling relational multi-modal data and evaluated them based on two datasets. We first identified the best-performing masking strategy for pretext tasks that require masking. Then, we compared different combinations of the pretext tasks based on self-supervised metrics as a proxy for the quality of the representation learned. The results reveal that masking values by replacing from the column's empirical distribution yields 4.6\% and 4\% higher accuracy on each dataset respectively than replacing them with a fixed value. We also found that different combinations of pretext tasks, even with different numbers of tasks, converge to marginally different values and MoCo further reduces this difference. Our findings imply that the number of pretext tasks can scale efficiently allowing for a more diverse representation to be learned. ...

Optimizing Dataset Quality for Enhanced Machine Learning Performance

A Study on the Impact of Dataset Metrics

Bachelor thesis (2024) - E. Ünlüyurt, Kubilay Atasu, T.A. Akyıldız, B. Özkan

With the increase of machine learning applications in our every-day life, high-quality datasets are becoming necessary to train accurate and reliable models. This research delves into the factors that contribute to a high quality dataset and examines how different dataset metrics affect the performance of machine learning models particularly focusing on Graph Neural Networks (GNNs) Tabular Transformers and Large Language Models (LLMs). The metrics, under scrutiny include graph sparsity, missing data cells, modularity and text length. Various datasets are adjusted to assess how these metrics impact model performance.

The results of the experiments reveal that sparse graphs can preserve relational information. However increasing density does not necessarily lead to improved performance due to noise interference. The models demonstrated accuracy and low error rates in the presence of significant missing data indicating their ability to handle incomplete information effectively and generalize well based on imputation strategies and structural design. Higher modularity was found to aid in capturing patterns. Introduced complexity that could potentially hinder performance. Notably text length emerged as a factor influencing model performance by offering contextual details.

These insights show the significance of considering attributes when designing machine learning models for intricate predictive tasks. Through experimentation and optimization of these metrics we can enhance model resilience and accuracy for applicability, in real world scenarios. ...

How to improve the performance of the fused architecture consisting of a tabular transformer and a graph neural network used for representation learning for multimodal data?

Bachelor thesis (2024) - D.D. Drashkov, Kubilay Atasu, T.A. Akyıldız, B. Özkan

The substantial amount of tabular data can be attributed to its storage convenience. There is a high demand for learning useful information from the data. To achieve that, machine learning models, called transformers, have been created. They can find patterns in the data, learn from them, and improve their predictive abilities based on that learning experience. There are also tabular transformers for tabular data. In order to attempt to increase the predictive performance of the transformers, we have combined them with graph neural networks (GNNs), which are again machine learning models, which work on graph data by learning information from the nodes and the edges. A graph representation of the dataset is created and input into the graph neural network. The architecture that fuses these two machine learning models is a more complex machine learning model that combines the transformer and the GNN. The aim is to increase the predictive ability of the model for values from the table or to predict whether an edge in the graph exists, which represents whether a transaction between two users exists. We have built the architecture using certain types of a tabular transformer and a graph neural network, FT-Transformer and GINe respectively, and the next step is to try modifying this architecture by using different models, and different ways of using these layers, for example how many copies we are creating of it. This has the potential to be a versatile model than can be used for different kinds of datasets. We have seen notable improvement in performance when using a different GNN, PNA. The transformer ResNet also shows to be on a similar or slightly better performing level than FT-Transformer when not combined with a GNN. GraphSage in the fused model underperforms significantly due to its weakness to capture simple graph structures. ...

Applying Fine-Tuning methods to FTTransformer in Anti Money Laundering applications

Bachelor thesis (2024) - V.P. de Graaff, Kubilay Atasu, T.A. Akyıldız

This research investigates the effectiveness of combining Feature Tokenizer Transformer (FTTransformer)[6] with graph neural networks for anti-money laundering (AML) applications. We explore various fine-tuning techniques, including LoRA[9] and vanilla fine-tuning, on our baseline FTT architecture. Using the IBM AML dataset [1], we compare the performance of different models and fine-tuning approaches. Our results indicate that FTT alone do not outperform GNN’s and careful configuration is required when working with datasets of Multi-Modality. This work contributes to the development of more efficient and accurate methods for detecting financial fraud patterns. ...