E. Congeduti | TU Delft Repository

From Multi-Class to Multi-Label: Revisiting Edge Dropping for Graph Neural Networks

Bachelor thesis (2026) - A. Andrei, M. Khosla, E. Congeduti, C. Lofi

Many real-world tasks involve data that is naturally structured as a graph, such as proteins linked by interactions, papers linked by citations, or people connected in a social network. A common goal is to predict properties of each entity based on how it is connected to the rest. The models used for this, called graph neural networks, work by allowing each node to aggregate information from its neighbours, then from its neighbours’ neighbours, and so on.

However, looking too far can blur distant nodes together. A common remedy is to train the model on a random subset of the links and discard the rest, in the hope of preventing it from over-relying on any single part of the graph. This idea was proposed and tested only on the simpler problem in which each entity carries exactly one label. Many real-world problems are not like this. A single protein may participate in multiple biological processes simultaneously, so an effective predictor must assign several labels at once. Whether dropping links still helps in this more realistic multi-label setting has not been studied.

This work addresses that question using synthetic graphs with precisely controlled structure, ranging from strongly clustered to nearly random, as well as three real biological and bibliographic datasets spanning the same range. In almost every case, dropping links harms performance rather than improving it, and the damage increases with the fraction of removed links, with a single weak exception on the most strongly multi-label, lowest-homophily real graph.

The cause is not a flaw in the technique but a property of the multi-label task itself. When multiple labels must be predicted from the same node representation, each label receives only a fraction of the learning signal it would obtain in a single-label setting. Discarding links reduces this already limited signal even further. ...

Evaluating Graph Neural Additive Networks for Multi-Label Node Classification

How does Graph Neural Additive Network (GNAN) perform on different multi-label node classification datasets, and what do the resulting explanations reveal about the data?

Bachelor thesis (2026) - A. Vlas, M. Khosla, E. Congeduti

Graph Neural Additive Networks (GNANs) extend generalised additive models to graph-structured data, providing interpretability by design rather than through post-hoc explanation. GNANs have been studied on multi-class node classification, but not in the multi-label setting, where a single node may belong to several categories at once. This paper presents the first adaptation and evaluation of GNAN for multi-label node classification. We replace the softmax output with a per-label sigmoid and give the distance function a per-label output, and we benchmark the adapted model against standard baselines on two real-world graphs that span a high- and a low-homophily regime, reporting Average Precision (AP) as the primary metric. We then analyse the learned shape functions and distance function to ask whether the built-in explanations are meaningful. GNAN is competitive with strong message-passing graph neural networks on the high-homophily graph, coming within about four AP points of the best baseline, but it drops to the lower end of the baselines on the low-homophily graph. Its learned distance function adapts to label homophily: a steep, reproducible local decay when neighbours are informative, and a flat, unstable profile when they are not. These results characterise when GNAN’s additive structure is an advantage and when it is a limitation, and they demonstrate the practical value of interpretability by design in the multi-label graph setting. ...

Heterophilic Methods on Multi-Label Graphs

How Do Methods Designed for Heterophilic Graphs Compare for Multi-Label Node Classification?

Bachelor thesis (2026) - C. Turcan, M. Khosla, E. Congeduti

Graph neural networks for node classification usually assume homophily, meaning that connected nodes tend to share labels. A promising family of methods has been developed for heterophilic graphs, where neighbouring nodes instead tend to have different labels. These methods are almost always evaluated on multi-class datasets, in which each node has exactly one label. However, many real-world problems are multi-label, with each node carrying a set of labels. An important question is whether methods designed for heterophilic graphs remain effective for multi-label node classification.

To investigate this, we compare two simple baselines with six heterophily-oriented graph neural networks across several real multi-label graphs and one multi-class control dataset. This is complemented by two synthetic experiments: one varying homophily directly and one varying the number of labels per node. We also include a supplementary experiment that collapses the multi-label structure of real-world datasets into a multi-class setting. We report this for completeness but interpret it cautiously, since this transformation alters the datasets too substantially for the two settings to be considered equivalent.

We find that performance appears to depend at least as much on graph homophily as on model sophistication. When homophily is low, heterophilic models rarely outperform either a plain feature-only baseline or a simple structure-only embedding. The benefits of message passing tend to re-emerge only as homophily increases. These results suggest that the gains reported by heterophily-oriented methods on multi-class benchmarks may not transfer automatically to the low-homophily multi-label setting. ...

Per-Node 𝑘-Hop Label Homophily Predicts GNN Accuracy in Multi-Label Node Classification

Bachelor thesis (2026) - V. Guzun, M. Khosla, E. Congeduti

Multi-label node classification asks a graph neural network to assign each node a set of labels. For single-label classification the success of these networks is usually attributed to label homophily, the tendency of connected nodes to share labels; for the multi-label case Zhao et al. introduced a homophily metric and reported that it tracks performance across datasets. That metric measures label similarity only between directly connected nodes, even though graph neural networks aggre-gate information from wider neighbourhoods. It is therefore unknown whether label similarity at larger distances also predicts how accurately a network classifies a node, and at which distance that signal is strongest. We generalise the metric to a per-node label-similarity score at any chosen distance. We then correlate this score node-by-node with trained-model accuracy on three benchmark datasets, repeat the analysis across model depths to separate a data-driven explanation from an architectural one, and test the relationship causally with two synthetic graph generators that plant label structure at controlled distances. The predictive signal is consistently strongest at distance two, not at directly adjacent nodes, across two standard architectures and up to several thousand test nodes per dataset; the most predictive distance does not shift with model depth, so it is a property of the graphs rather than of the model; and controlled synthetic experiments confirm a causal effect of two-hop homophily on accuracy in graphs built to isolate it, with weaker and partially confounded support at three hops. Direct measurements further show that measuring similarity at an exact distance, rather than over a cumulative neighbourhood, is the right unit for identifying that scale. All code, data splits, and figures are released for reproducibility. ...

Property-Driven Comparison of GNNs on Multi-Label Graphs

Bachelor thesis (2026) - V. Paiu, M. Khosla, E. Congeduti, C. Lofi

Multi-label node classification on graphs occurs in domains where entities can have several labels, such as biological, social, and recommendation networks. Most Graph Neural Networks (GNN) research focuses on multi-class graphs, so it remains unclear how dataset properties affect model performance in multi-label settings. This thesis studies how structural, feature, and label properties influence Graph Convolutional Network (GCN) and Heterophilic Graph Convolutional Network (H2GCN). These models were chosen because they are widely used and represent homophilous and heterophilous graph learning, respectively. Synthetic graphs are used to vary their properties in a controlled way, with real-world datasets used as validation points, and a pooled Ridge regression then tests how well each property predicts model performance in a joint setting. The results show that no single property explains performance solely by itself. Label imbalance reduces both models similarly, structural noise harms GCN more, unlabeled nodes degrade the performance of H2GCN more quickly, and cross-class neighbourhood similarity adds information beyond homophily. All code, seeds, and trained-graph properties are released publicly. ...

Graph Neural Networks for Long-Term Traffic Forecasting

Can GNNs effectively handle long-term predictions and how does their accuracy degrade over time?

Bachelor thesis (2024) - V. Vranceanu, E. Congeduti, E.A. Markatou

Traffic forecasting is a branch of spatiotemporal forecasting that involves predicting future traffic speed or volume based on real-world data. It has a significant impact on urban mobility and quality of life, as it directly contributes to improving traffic management and trip planning. This study evaluates the performance of Graph Neural Networks (GNNs) in handling long-term forecasting, defined as predictions made up to 10 hours ahead. It addresses the evolution of performance and factors that may impact accuracy, such as fluctuations in traffic speed and road network configurations. The experiments are done using subsets of a benchmark dataset for traffic forecasting and a state-of-the-art GNN model. The findings showcase a logarithmic growth in prediction errors and the presence of two types of traffic jams—sudden and regular—along with their impact on prediction accuracy. Furthermore, the results highlight the complexity of quantifying the influence a factor has on forecasting performance, such as road network configuration or missing values. ...

Graph Neural Networks Training Set Analysis

Effect of Training Data Size

Bachelor thesis (2024) - A.V. Păcurar, E. Congeduti, E.A. Markatou

With the rapid increase in popularity of graph neural networks (GNNs) for the task of traffic forecasting, understanding the inner workings of these complex models becomes more important. This experiment aims to deepen our understanding of the importance that the training data has in regards to the ability of GNNs to accurately predict traffic. By repeatedly training the same GNN model with different training datasets spanning over various time frames and comparing standard performance metrics computed based on the predictions performed by the model, this paper concludes that while using less training data leads to a slight decrease in performance, this is heavily dependent on the quality of the dataset. If the data gathering process is short and the sensors are not properly maintained, GNNs are not able to accurately predict traffic. On the other hand, if the data gathering process goes well and there are few missing values, GNNs perform well even when trained with smaller amounts of historical data. ...

Effectiveness of Graph Neural Networks and Simpler Network Models in Various Traffic Scenarios

Graph Neural Networks for Traffic Forecasting

Bachelor thesis (2024) - Wiktor Grzybko, E. Congeduti, E.A. Markatou

Traffic forecasting is key to improving urban transport and reducing congestion and pollution. While advanced models like Graph Neural Networks (GNNs), can capture complex patterns in traffic flow, they are resource-intensive and do not scale well. This problem can be mitigated by using simpler models that are less influenced by the size of the road network, making them more practical for real-world applications. This study investigates whether simpler network-based models, particularly Long Short-Term Memory (LSTM) networks, can match or surpass the performance of GNNs, such as the Diffusion Convolutional Recurrent neural network (DCRNN), in specific scenarios. Using popular benchmark datasets, we compared the performance of the LSTM and DCRNN models under different conditions, including different sensor distributions and prediction horizons. The results indicate that while DCRNN highly outperforms LSTM with numerous sensors and longer prediction horizons, LSTM gives promising results with fewer sensors and shorter horizons. In this scenario, the difference in performance is minimal regardless of the location of sensors, also offering significant computational efficiency. These findings suggest that LSTM models may be a practical alternative for traffic forecasting in resource-constrained scenarios, providing a path to more efficient urban traffic management. ...

Scalability of Graph Neural Networks in Traffic Forecasting

Assessing Accuracy and Computational Efficiency in Varying Road Network Sizes and Complexities

Bachelor thesis (2024) - D.N. Savvidi, E. Congeduti, E.A. Markatou

This paper explores the scalability of Graph Neural Networks (GNNs) in the context of traffic forecasting, a critical area for improving urban mobility and reducing congestion. Despite GNNs’ demonstrated effectiveness in handling complex spatiotemporal dependencies in traffic data, scaling them to large road networks remains challenging due to increased computational requirements. This study aims to evaluate how the accuracy and computational cost of a state-of-the-art traffic forecasting GNN, the Decoupled Dynamic Spatio-Temporal Graph Neural Network, change with varying road network sizes and complexities (i.e., sensor density). Using two real-world datasets, three experiments are conducted: scaling map area, scaling graph complexity, and testing the geographic location effect. Findings show that larger graphs generally improve accuracy and GPU efficiency. Moreover, geographic location affects accuracy, whereas sensor density has minimal impact. ...

Regional Transferability of Graph Neural Networks for Traffic Forecasting

Bachelor thesis (2024) - I. Kravcevs, E. Congeduti, E.A. Markatou

Efficient traffic forecasting is an important component of modern traffic management systems, enabling real-time route guidance and traffic control. Graph Neural Networks (GNN) have demonstrated state-of-the-art performance in this domain due to their ability to capture spatial and temporal dependencies in complex traffic data. However, GNNs typically require extensive historical data and are highly dependent on the specific road structure of the training region, posing challenges for their application in areas lacking such data. This study explores the transferability of GNN models in traffic forecasting, specifically how a GNN, trained in the region with long-horizon historical data, performs when applied to structurally different regional scenarios without historical data. The research investigates the impact of spatial differences between regions on the model's performance. The paper examines multiple metrics for regional similarity between training and transfer regions and shows their correlation with the transferred model's performance.
...

Assessing Methods for Handling Missing Data Using an LSTM Deep Learning Model in Traffic Forecasting

Bachelor thesis (2023) - W.W. Büthker, E. Congeduti, G. Iosifidis

Due to the increasing popularity of various types of sensors in traffic management, it has become significantly easier to collect data on traffic flow. However, the integrity of these data sets is often compromised due to missing values resulting from sensor failures, communication errors, and other malfunctions. This study investigates the effect of missing data on the performance of Long Short-Term Memory (LSTM) models in traffic flow prediction and assesses strategies to handle these missing values. By actively removing values from a complete data set, three strategies to handle these missing values are evaluated: dropping null values, replacing them with zero, and linear interpolation. We show that LSTM models are surprisingly resilient to missing data, with little impact on prediction accuracy for up to 40% of missing data, irrespective of the strategy used. For higher proportions of missing data, dropping null values leads to significant performance degradation, while zero-filling and interpolation maintain predictive accuracy. This paper provides insights into the choice of missing data handling strategies in time-series prediction tasks, demonstrating the potential of LSTM models for traffic forecasting under less-than-ideal data conditions ...

Comparative Analysis of LSTM, ARIMA, and Facebook’s Prophet for Traffic Forecasting

Advancements, Challenges, and Limitations

Bachelor thesis (2023) - T.Z. Üzel, Elena Congeduti, Georgios Iosifidis

Accurate short-term traffic forecasting plays a crucial role in Intelligent Transportation Systems for effective traffic management and planning. In this study, the performances of three popular forecasting models are explored: Long Short-Term Memory (LSTM), Autoregressive Integrated Moving Average (ARIMA), and Facebook's Prophet, for short-term traffic prediction. The models were trained and evaluated using a dataset of traffic flow data collected from 161 detectors over a specific time period. The experimental results reveal that ARIMA outperformed LSTM and Prophet in terms of Root Mean Squared Error (RMSE) and Mean Absolute Percentage Error (MAPE). This suggests that while deep learning methods, such as LSTM, are generally acknowledged to outperform ARIMA in short-term traffic forecasting, this study reveals that there are specific scenarios where such well-accepted fact needs to be tested. ...

Effects of weather data on traffic flow predictions using an LSTM deep learning model

Bachelor thesis (2023) - N. Nachev, Elena Congeduti, Georgios Iosifidis

Accurate traffic forecasts are a key element in improving the traffic flow of urban cities. An efficient approach to this problem is to use a deep learning Long Short Term Memory (LSTM) model. Including weather data in the model can improve prediction accuracy because traffic volumes are sensitive to weather changes. The aim of this study is to show how such a model can be constructed for traffic flow predictions, and how it can be improved with the use of weather data. Results show that an LSTM model gives accurate predictions as a baseline model, and the inclusion of weather data gives a slight improvement in accuracy when predicting single sensors. The improvement was higher on long term predictions of 2.5 hours, and the best prediction results were obtained when adding a lag of 30 minutes to the rain data. ...

Long term predictions for traffic forecasting

How does the accuracy degrade with time?

Bachelor thesis (2023) - S.J. Verlooy, E. Congeduti, G. Iosifidis

Traffic prediction plays a big role in efficient transport planning capabilities and can reduce traffic congestion. In this study the application of Long Short-Term Memory (LSTM) models for predicting traffic volumes across varying prediction horizons is investigated. The data used is collected by the municipality of The Hague for a single month. The study focuses on comparing the performance of the LSTM across different time horizons up to 10 hours in the future. To evaluate the performance of the LSTM models, two common evaluation measures are employed: Root Mean Square Error (RMSE) and Symmetric Mean Absolute Percentage Error (SMAPE). The baseline for the predictions is set at a 15-minute future forecast. Comparing the 1-hour prediction against the 10-hour predictions relative to the baseline RMSE, the RMSE increased threefold. However, the SMAPE first increases, but surprisingly after 6 hours starts to decrease again. ...

Deep learning approaches to short term traffic forecasting

Capturing the spatial temporal relation in historic traffic data

Bachelor thesis (2023) - T.I. Kuiper, E. Congeduti, G. Iosifidis

The amount of cars on the roads is increasing at a rapid pace, causing traffic jams to become commonplace. One way to decrease the amount of traffic congestion is by building an Intelligent Transportation System (ITS) which helps traffic flow optimally. An important tool for an ITS is short term traffic forecasting. Better forecasts will enable the ITS to proactively prevent congestion. Recent years have seen a great increase in the availability of traffic data. As a result deep learning approaches have begun to emerge as models of choice in the short term traffic forecasting domain. Among deep learning approaches Long Short Term Memory (LSTM) and Temporal Convolutional Networks (TCN) have both shown state-of-the-art performance in general forecasting tasks as well as promising results in traffic forecasting. This work has compared both of these approaches in terms of capturing the temporal spatial correlation and scalability. The LSTM showed more ability to capture the temporal spatial correlation while both architectures seemed equally scalable. ...

Modelling Agents with Variational Autoencoders in Multi-Agent Sequential Decision Making

Master thesis (2023) - H.L. Lenferink, F.A. Oliehoek, E. Congeduti

The ability to model other agents can be of great value in multi-agent sequential decision making problems and has become more accessible due to the introduction of deep learning into reinforcement learning. In this study, the aim is to investigate the usefulness of modelling other agents using variational autoencoder based models in partially observable settings. Previous studies that model other agents using (variational) autoencoders have shown promising results. In these studies, a single protagonist agent learns representations of other agents to then use them as additional components of its observation space which is, as such, augmented with those representations. It is, however, not always entirely clear what is being modelled and what would be the best feature of the other agent to represent. Moreover, in these works, a comparison between the used variational autoencoder based models and a baseline classifier trained to solve the same classification task is missing. This study investigates which features can best be used for the augmentation of the observations of deep reinforcement learning agents and if these features can be represented by variational autoencoder based models. Subsequently, it compares these models with a baseline classifier that solves the same classification problem to find out which model yields the best results when used for augmenting observations. Overall, the results suggest that it is beneficial to augment the observations of deep reinforcement learning agents with features related to other agents learned in a pre-training phase. Another interesting result is that the baseline classifier achieves similar or better performance compared to the variational autoencoder based model. Further research needs to be conducted to confirm the soundness of these findings. ...