VP
V. Paiu
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
1 records found
1
Multi-label node classification on graphs occurs in domains where entities can have several labels, such as biological, social, and recommendation networks. Most Graph Neural Networks (GNN) research focuses on multi-class graphs, so it remains unclear how dataset properties affect model performance in multi-label settings. This thesis studies how structural, feature, and label properties influence Graph Convolutional Network (GCN) and Heterophilic Graph Convolutional Network (H2GCN). These models were chosen because they are widely used and represent homophilous and heterophilous graph learning, respectively. Synthetic graphs are used to vary their properties in a controlled way, with real-world datasets used as validation points, and a pooled Ridge regression then tests how well each property predicts model performance in a joint setting. The results show that no single property explains performance solely by itself. Label imbalance reduces both models similarly, structural noise harms GCN more, unlabeled nodes degrade the performance of H2GCN more quickly, and cross-class neighbourhood similarity adds information beyond homophily. All code, seeds, and trained-graph properties are released publicly.
...
Multi-label node classification on graphs occurs in domains where entities can have several labels, such as biological, social, and recommendation networks. Most Graph Neural Networks (GNN) research focuses on multi-class graphs, so it remains unclear how dataset properties affect model performance in multi-label settings. This thesis studies how structural, feature, and label properties influence Graph Convolutional Network (GCN) and Heterophilic Graph Convolutional Network (H2GCN). These models were chosen because they are widely used and represent homophilous and heterophilous graph learning, respectively. Synthetic graphs are used to vary their properties in a controlled way, with real-world datasets used as validation points, and a pooled Ridge regression then tests how well each property predicts model performance in a joint setting. The results show that no single property explains performance solely by itself. Label imbalance reduces both models similarly, structural noise harms GCN more, unlabeled nodes degrade the performance of H2GCN more quickly, and cross-class neighbourhood similarity adds information beyond homophily. All code, seeds, and trained-graph properties are released publicly.