Subgraph Matching via Fused Gromov-Wasserstein Distance

More Info
expand_more

Abstract

Subgraph matching is a fundamental problem in various fields such as machine learning, computer vision, image processing, and bioinformatics, where detecting specific substructures within an object is often crucial. In these domains, not only structure plays an essential role, but also the feature information on nodes should be incorporated, thus highlighting the necessity for comprehensive analytical approaches.

In this thesis, we propose two novel subgraph matching frameworks using the Fused Gromov-Wasserstein (FGW) distance, namely the Subgraph Optimal Transport (SOT) and the Sliding Subgraph Optimal Transport (SSOT). Both frameworks integrate a dummy node strategy to handle the discrepancy between two graphs of different sizes. The SSOT extends upon the SOT by incorporating a sliding window framework and Wasserstein pruning to enhance the performance, especially for sparse large graphs. Our frameworks can be easily implemented and are adaptable for problems of exact matching, top-k approximate matching, and inexact matching.

We further propose a normalized FGW distance to cater to the practical interests and enhance the performance evaluation. We adopt the Frank-Wolfe algorithm for optimization and develop computation-reducing techniques by isolating the dummy node.

By conducting experiments on both synthetic and real-world datasets, we demonstrate that the SOT method achieves excellent performance on small graphs, and the SSOT method improves the accuracy over the SOT on large graphs. Both these two methods show the ability to outperform the state-of-the-art methods in noisy environments in terms of accuracy and efficiency.