From Multi-Class to Multi-Label: Revisiting Edge Dropping for Graph Neural Networks
A. Andrei (TU Delft - Electrical Engineering, Mathematics and Computer Science)
M. Khosla – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
E. Congeduti – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
C. Lofi – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Many real-world tasks involve data that is naturally structured as a graph, such as proteins linked by interactions, papers linked by citations, or people connected in a social network. A common goal is to predict properties of each entity based on how it is connected to the rest. The models used for this, called graph neural networks, work by allowing each node to aggregate information from its neighbours, then from its neighbours’ neighbours, and so on.
However, looking too far can blur distant nodes together. A common remedy is to train the model on a random subset of the links and discard the rest, in the hope of preventing it from over-relying on any single part of the graph. This idea was proposed and tested only on the simpler problem in which each entity carries exactly one label. Many real-world problems are not like this. A single protein may participate in multiple biological processes simultaneously, so an effective predictor must assign several labels at once. Whether dropping links still helps in this more realistic multi-label setting has not been studied.
This work addresses that question using synthetic graphs with precisely controlled structure, ranging from strongly clustered to nearly random, as well as three real biological and bibliographic datasets spanning the same range. In almost every case, dropping links harms performance rather than improving it, and the damage increases with the fraction of removed links, with a single weak exception on the most strongly multi-label, lowest-homophily real graph.
The cause is not a flaw in the technique but a property of the multi-label task itself. When multiple labels must be predicted from the same node representation, each label receives only a fraction of the learning signal it would obtain in a single-label setting. Discarding links reduces this already limited signal even further.