Characterizing learning difficulty in graph-structured data

an empirical study of models and data

Doctoral Thesis (2026)
Author(s)

T. Zhao (TU Delft - Pattern Recognition and Bioinformatics)

Contributor(s)

A. Hanjalic – Promotor (TU Delft - Intelligent Systems)

M. Khosla – Copromotor (TU Delft - Multimedia Computing)

DOI related publication
https://doi.org/10.4233/uuid:863f5413-fbab-4e8f-82f7-b9bf1aeae9d6 Final published version
More Info
expand_more
Publication Year
2026
Language
English
Defense Date
12-05-2026
Awarding Institution
ISBN (electronic)
978-94-6518-037-3
Downloads counter
20
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Graphs provide a natural way to represent both the attributes of individual entities and the structure of their interconnections, making them a powerful framework for modeling complex systems with intricate relationships. Such networks appear across diverse domains, including social networks, biology, quantum physics, and knowledge graphs. For example, proteins and their interactions in protein–protein networks, or users and friendships in social networks.

In recent years, Graph Neural Networks (GNNs) have become the dominant paradigm for learning from graph-structured data, achieving strong results in tasks such as node classification, link prediction, and graph classification on academic benchmark datasets. However, a closer examination of one of the real-world scenario, multi-label node classification (MLNC) datasets, reveals more complicated attribute distributions and substantial data quality issues where GNNs fail to learn. Many real-world graphs are noisy, exhibit unbalanced label distributions, and display low label homophily, which complicates the extraction of meaningful information from local neighborhoods. This observation raises an important question: to what extent do the structural and distributional properties of graph data shape the performance of the model? ....

Files

License info not available