Characterizing learning difficulty in graph-structured data

None, None

doi:10.4233/uuid:863f5413-fbab-4e8f-82f7-b9bf1aeae9d6

Characterizing learning difficulty in graph-structured data

an empirical study of models and data

Doctoral Thesis (2026)

Author(s)

T. Zhao (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Hanjalic – Promotor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

M. Khosla – Copromotor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group

Pattern Recognition and Bioinformatics

Graph Neural Networks Multi-label Node Classification Continual Graph Learning Instance-level Node Profiling

DOI related publication

https://doi.org/10.4233/uuid:863f5413-fbab-4e8f-82f7-b9bf1aeae9d6 Final published version

To reference this document use

https://doi.org/10.4233/uuid:863f5413-fbab-4e8f-82f7-b9bf1aeae9d6

More Info

expand_more

Publication Year

2026

Language

English

Defense Date

12-05-2026

Awarding Institution

Delft University of Technology

Research Group

Pattern Recognition and Bioinformatics

ISBN (electronic)

978-94-6518-037-3

Downloads counter

65

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Graphs provide a natural way to represent both the attributes of individual entities and the structure of their interconnections, making them a powerful framework for modeling complex systems with intricate relationships. Such networks appear across diverse domains, including social networks, biology, quantum physics, and knowledge graphs. For example, proteins and their interactions in protein–protein networks, or users and friendships in social networks.

In recent years, Graph Neural Networks (GNNs) have become the dominant paradigm for learning from graph-structured data, achieving strong results in tasks such as node classification, link prediction, and graph classification on academic benchmark datasets. However, a closer examination of one of the real-world scenario, multi-label node classification (MLNC) datasets, reveals more complicated attribute distributions and substantial data quality issues where GNNs fail to learn. Many real-world graphs are noisy, exhibit unbalanced label distributions, and display low label homophily, which complicates the extraction of meaningful information from local neighborhoods. This observation raises an important question: to what extent do the structural and distributional properties of graph data shape the performance of the model? ....

Files

Dissertation_Tianqi_Zhao_13_.p... (pdf)

(pdf | 20.8 Mb)

License info not available