Kubilay Atasu
Please Note
18 records found
1
This thesis extends adversarial robustness analysis to multigraph GNNs through three contributions. First, it reformulates GNN message passing and attack optimisation over the incidence matrix instead of the adjacency matrix, yielding the first gradient-based structural attack that retains multi-edge structure. Second, it introduces unnoticeability loss terms that constrain perturbations to maintain the graph's statistical fingerprint, including the frequency of characteristic patterns such as short transaction cycles, keeping the attack statistically plausible and unnoticeable at the macro level. Third, it scales the framework to large networks with projected randomised block coordinate descent. On the IBM synthetic anti-money laundering dataset, learned attacks substantially reduce detection accuracy compared to non-learnable perturbations, and adversarial training recovers robustness, showing that multigraph GNNs are both vulnerable to structural manipulation and defensible against it. ...
This thesis extends adversarial robustness analysis to multigraph GNNs through three contributions. First, it reformulates GNN message passing and attack optimisation over the incidence matrix instead of the adjacency matrix, yielding the first gradient-based structural attack that retains multi-edge structure. Second, it introduces unnoticeability loss terms that constrain perturbations to maintain the graph's statistical fingerprint, including the frequency of characteristic patterns such as short transaction cycles, keeping the attack statistically plausible and unnoticeable at the macro level. Third, it scales the framework to large networks with projected randomised block coordinate descent. On the IBM synthetic anti-money laundering dataset, learned attacks substantially reduce detection accuracy compared to non-learnable perturbations, and adversarial training recovers robustness, showing that multigraph GNNs are both vulnerable to structural manipulation and defensible against it.
Layer-Wise Exchange for Subgraph Federated Learning
An Application to Financial Crime Detection
Time Series Foundation Models for Operational Support in Geothermal Systems
Bridging the Gap between Advanced AI and Energy Infrastructure
In this work, we explore the application of state-of-the-art Time Series Foundation Models (TSFMs) as a unified alternative for both forecasting and anomaly detection in geothermal operations. We present a geothermal-specific benchmark for time series modeling and conduct a systematic comparative evaluation of conventional machine and deep learning baselines against pretrained TSFMs under zero-shot conditions. The results demonstrate that, in forecasting tasks, covariate-aware TSFMs, particularly Chronos, consistently outperform all trained baselines, achieving 22–35% lower RMSE across horizons. For anomaly detection, we evaluate multiple detection strategies and find that performance is influenced more strongly by the choice of detection strategy and the availability of labeled fault data than by forecasting accuracy alone, with TSFM embeddings consistently encoding system information and enabling effective anomaly detection under labeled conditions.
These findings establish TSFMs as a promising foundation for intelligent condition monitoring in geothermal and broader industrial time series applications, while highlighting the importance of explicit covariate modeling for this class of systems. ...
In this work, we explore the application of state-of-the-art Time Series Foundation Models (TSFMs) as a unified alternative for both forecasting and anomaly detection in geothermal operations. We present a geothermal-specific benchmark for time series modeling and conduct a systematic comparative evaluation of conventional machine and deep learning baselines against pretrained TSFMs under zero-shot conditions. The results demonstrate that, in forecasting tasks, covariate-aware TSFMs, particularly Chronos, consistently outperform all trained baselines, achieving 22–35% lower RMSE across horizons. For anomaly detection, we evaluate multiple detection strategies and find that performance is influenced more strongly by the choice of detection strategy and the availability of labeled fault data than by forecasting accuracy alone, with TSFM embeddings consistently encoding system information and enabling effective anomaly detection under labeled conditions.
These findings establish TSFMs as a promising foundation for intelligent condition monitoring in geothermal and broader industrial time series applications, while highlighting the importance of explicit covariate modeling for this class of systems.
Understanding Memorization in Large Language Models
What controls memorization rate? From entropy to conditional entropy or conditioning structure
We demonstrate a counterintuitive regime in which random token sequences are memorized faster than structured natural language, contradicting standard explanations. We formalize a hierarchy of conditioning levels and introduce K-arity, a scalar complexity measure counting the number of prefix tokens jointly required to make a continuation deterministic. Through controlled experiments on synthetic datasets, we show that conditioning level and K-arity are predictive of memorization behavior. Attention analysis reveals that disambiguating cues are most clearly visible in early attention patterns. Natural language experiments show that, in text rich with redundant linguistic cues, isolated manipulations of conditioning complexity do not produce detectable differences, highlighting the gap between synthetic and naturalistic settings. This single principle connects input representation, entropy, identifying tokens, and context length within a common theoretical lens. ...
We demonstrate a counterintuitive regime in which random token sequences are memorized faster than structured natural language, contradicting standard explanations. We formalize a hierarchy of conditioning levels and introduce K-arity, a scalar complexity measure counting the number of prefix tokens jointly required to make a continuation deterministic. Through controlled experiments on synthetic datasets, we show that conditioning level and K-arity are predictive of memorization behavior. Attention analysis reveals that disambiguating cues are most clearly visible in early attention patterns. Natural language experiments show that, in text rich with redundant linguistic cues, isolated manipulations of conditioning complexity do not produce detectable differences, highlighting the gap between synthetic and naturalistic settings. This single principle connects input representation, entropy, identifying tokens, and context length within a common theoretical lens.
A new way of cooperative cycle detection against financial crime
Decentralised cycle detection using cross-institutional transactions
Graph Learning on Financial Tabular Data
Cascade and Interleaved architectures using GNNs and Transformers
Graph Learning on Tabular Data: Think Global And Local
Full Fusion and Interleaved architectures on IBM’s Anti-Money Laundering Data
The Impact of Realistic Laundering Subgraph Perturbations on Graph Neural Network Based Anti-Money Laundering Systems
Trustworthy Financial Crime Analytics
Collaborative Detection of Malicious Clients for Financial Institutions using Multi-Party Computation
Trustworthy Financial Crime Analytics
This study evaluates the effectiveness of various configurations for integrating a large language model (LLM) with models capable of handling multimodal data.\\
We explore the advantages of using pre-trained LLMs for generating text embeddings and the benefits of fine-tuning LLMs for specific tasks. Our investigation includes various fine-tuning strategies, such as Low-Rank Adaptation (LoRA), prompt tuning, and full fine-tuning, applied to both smaller and larger language models. Additionally, we analyze different training setups, including sequential and cascaded training of LLMs and downstream architectures. Our comparative analysis evaluates the performance and cost-effectiveness of these methods. The findings indicate that while full fine-tuning achieves the best results, LoRA offers a practical balance between computational efficiency and model performance. We also highlight the correlation between increased LLM size and corresponding increases in cost and performance. ...
This study evaluates the effectiveness of various configurations for integrating a large language model (LLM) with models capable of handling multimodal data.\\
We explore the advantages of using pre-trained LLMs for generating text embeddings and the benefits of fine-tuning LLMs for specific tasks. Our investigation includes various fine-tuning strategies, such as Low-Rank Adaptation (LoRA), prompt tuning, and full fine-tuning, applied to both smaller and larger language models. Additionally, we analyze different training setups, including sequential and cascaded training of LLMs and downstream architectures. Our comparative analysis evaluates the performance and cost-effectiveness of these methods. The findings indicate that while full fine-tuning achieves the best results, LoRA offers a practical balance between computational efficiency and model performance. We also highlight the correlation between increased LLM size and corresponding increases in cost and performance.
Self-Supervised Representation Learning for Relational Multimodal Data
Should we combine multiple pretext tasks?
Optimizing Dataset Quality for Enhanced Machine Learning Performance
A Study on the Impact of Dataset Metrics
The results of the experiments reveal that sparse graphs can preserve relational information. However increasing density does not necessarily lead to improved performance due to noise interference. The models demonstrated accuracy and low error rates in the presence of significant missing data indicating their ability to handle incomplete information effectively and generalize well based on imputation strategies and structural design. Higher modularity was found to aid in capturing patterns. Introduced complexity that could potentially hinder performance. Notably text length emerged as a factor influencing model performance by offering contextual details.
These insights show the significance of considering attributes when designing machine learning models for intricate predictive tasks. Through experimentation and optimization of these metrics we can enhance model resilience and accuracy for applicability, in real world scenarios. ...
The results of the experiments reveal that sparse graphs can preserve relational information. However increasing density does not necessarily lead to improved performance due to noise interference. The models demonstrated accuracy and low error rates in the presence of significant missing data indicating their ability to handle incomplete information effectively and generalize well based on imputation strategies and structural design. Higher modularity was found to aid in capturing patterns. Introduced complexity that could potentially hinder performance. Notably text length emerged as a factor influencing model performance by offering contextual details.
These insights show the significance of considering attributes when designing machine learning models for intricate predictive tasks. Through experimentation and optimization of these metrics we can enhance model resilience and accuracy for applicability, in real world scenarios.