Property-Driven Comparison of GNNs on Multi-Label Graphs

Bachelor Thesis (2026)
Author(s)

V. Paiu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M. Khosla – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

E. Congeduti – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

C. Lofi – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
19-06-2026
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
3
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Multi-label node classification on graphs occurs in domains where entities can have several labels, such as biological, social, and recommendation networks. Most Graph Neural Networks (GNN) research focuses on multi-class graphs, so it remains unclear how dataset properties affect model performance in multi-label settings. This thesis studies how structural, feature, and label properties influence Graph Convolutional Network (GCN) and Heterophilic Graph Convolutional Network (H2GCN). These models were chosen because they are widely used and represent homophilous and heterophilous graph learning, respectively. Synthetic graphs are used to vary their properties in a controlled way, with real-world datasets used as validation points, and a pooled Ridge regression then tests how well each property predicts model performance in a joint setting. The results show that no single property explains performance solely by itself. Label imbalance reduces both models similarly, structural noise harms GCN more, unlabeled nodes degrade the performance of H2GCN more quickly, and cross-class neighbourhood similarity adds information beyond homophily. All code, seeds, and trained-graph properties are released publicly.

Files

License info not available