TZ

T. Zhao

info

Please Note

2 records found

Doctoral thesis (2026) - T. Zhao, A. Hanjalic, M. Khosla
Graphs provide a natural way to represent both the attributes of individual entities and the structure of their interconnections, making them a powerful framework for modeling complex systems with intricate relationships. Such networks appear across diverse domains, including social networks, biology, quantum physics, and knowledge graphs. For example, proteins and their interactions in protein–protein networks, or users and friendships in social networks.

In recent years, Graph Neural Networks (GNNs) have become the dominant paradigm for learning from graph-structured data, achieving strong results in tasks such as node classification, link prediction, and graph classification on academic benchmark datasets. However, a closer examination of one of the real-world scenario, multi-label node classification (MLNC) datasets, reveals more complicated attribute distributions and substantial data quality issues where GNNs fail to learn. Many real-world graphs are noisy, exhibit unbalanced label distributions, and display low label homophily, which complicates the extraction of meaningful information from local neighborhoods. This observation raises an important question: to what extent do the structural and distributional properties of graph data shape the performance of the model? .... ...
Journal article (2023) - T. Zhao, Ngan Thi Dong, A. Hanjalic, M. Khosla
Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have multiple labels has so far received little attention. The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. While high label similarity (high homophily) is usually attributed to the success of GNNs, we argue that a multi-label scenario does not follow the usual semantics of homophily and heterophily so far defined for a multi-class scenario. As our second contribution, we define homophily and Cross-Class Neighborhood Similarity for the multi-label scenario and provide a thorough analyses of the collected multi-label datasets. Finally, we perform a large-scale comparative study with methods and datasets and analyse the performances of the methods to assess the progress made by current state of the art in the multi-label node classification scenario. We release our benchmark at https://github.com/Tianqi-py/MLGNC. ...