Active Semi-Supervised Learning For Diffusions on Graphs

More Info


In statistical learning over large data-sets, labeling all points is expensive and time-consuming. Semi-supervised classification allows learning with very few labels. Naturally, selecting a few points to label becomes crucial as the performance relies heavily on the labeled points. The motivation behind active learning is to build an optimal training set keeping the classifier in mind. Random or heuristic-driven selection does not care for the classification process or are trivially defined. We are interested in the graph structure formed by the data, as seen in citation, social and biological networks. Accordingly, active semi-supervised learning on graphs labels nodes to enhance the performance of classification. We propose a new methodology to perform active learning for diffusion-based semi-supervised classifiers. In particular, we focus on a classifier which diffuses probability distributions over the graph through random walks. We postulate the active learning problem as $i)$ a linear inverse problem with a sparse starting distribution over the nodes; $ii)$ a model output selection problem. For the former, we use sparsity-regularized inverse problems to select nodes. For the latter, we use tools from Compressed Sensing and Sparse Sensing to select the nodes with the relevant model output. We show that we can select all the relevant nodes in a single shot fashion, hence avoiding reliance on multiple training phases. Results on simulated as well as real data-sets show the proposed methods outperform random labeling, thereby proving to be relevant for active semi-supervised learning on graphs.