Per-Node 𝑘-Hop Label Homophily Predicts GNN Accuracy in Multi-Label Node Classification

None, None

Per-Node 𝑘-Hop Label Homophily Predicts GNN Accuracy in Multi-Label Node Classification

Bachelor Thesis (2026)

Author(s)

V. Guzun (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M. Khosla – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

E. Congeduti – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Graphs GCN Graph neural networks GNN Homophily Multilabel GAT

To reference this document use

https://resolver.tudelft.nl/uuid:9c83ac2e-a64d-48d2-9f2a-bec2b96e9c47

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

19-06-2026

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

21

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Multi-label node classification asks a graph neural network to assign each node a set of labels. For single-label classification the success of these networks is usually attributed to label homophily, the tendency of connected nodes to share labels; for the multi-label case Zhao et al. introduced a homophily metric and reported that it tracks performance across datasets. That metric measures label similarity only between directly connected nodes, even though graph neural networks aggre-gate information from wider neighbourhoods. It is therefore unknown whether label similarity at larger distances also predicts how accurately a network classifies a node, and at which distance that signal is strongest. We generalise the metric to a per-node label-similarity score at any chosen distance. We then correlate this score node-by-node with trained-model accuracy on three benchmark datasets, repeat the analysis across model depths to separate a data-driven explanation from an architectural one, and test the relationship causally with two synthetic graph generators that plant label structure at controlled distances. The predictive signal is consistently strongest at distance two, not at directly adjacent nodes, across two standard architectures and up to several thousand test nodes per dataset; the most predictive distance does not shift with model depth, so it is a property of the graphs rather than of the model; and controlled synthetic experiments confirm a causal effect of two-hop homophily on accuracy in graphs built to isolate it, with weaker and partially confounded support at three hops. Direct measurements further show that measuring similarity at an exact distance, rather than over a cumulative neighbourhood, is the right unit for identifying that scale. All code, data splits, and figures are released for reproducibility.

Files

RESEARCH_PAPER-1.pdf

(pdf | 1.21 Mb)

License info not available