How Does Label Noise Affect the Learning Curves of Graph Neural Networks?
I. Markov (TU Delft - Electrical Engineering, Mathematics and Computer Science)
E. Isufi – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
C. Liu – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
M.S. Jebali – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
T.J. Viering – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Graph Neural Networks (GNNs) achieve strong performance on node classification tasks, but their effectiveness often depends on the quality of the supervision, and real-world labels are often noisy. Learning curves—which describe how test performance scales with the number of labelled training nodes—have been extensively studied in classical machine learning, but their behaviour under realistic annotation noise in GNNs remains poorly explored.
We present a systematic empirical study of how three label noise protocols—symmetric random flipping, feature-dependent asymmetric flipping, and structure-dependent flipping—affect the learning curve shape of ChebNet across four benchmark graphs spanning homophilic and heterophilic structure, at noise rates η ∈ {0.1, 0.3, 0.5}.
The central finding is that noise does not simply shift the learning curve downward: above a moderate noise rate it reduces the effective slope, so the gap between clean and noisy performance widens as the label budget grows. Feature-dependent asymmetric noise is consistently the most harmful protocol across all datasets and budgets for η ≥ 0.3, while structure-dependent noise is the least harmful on homophilic graphs. On graphs where the model already operates near its performance limit, noise type has little practical effect.
These findings suggest that beyond a moderate noise rate, cleaning existing labels yields greater returns than acquiring more noisy ones, and that the nature of annotation error interacts with graph structure in ways that single-budget evaluations cannot detect.