When does the leading eigenvector agree with cluster means? A fixed-point analysis of Oja’s and SoftHebb’s streaming rules
O. Argherie (TU Delft - Electrical Engineering, Mathematics and Computer Science)
S. Tan – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Y. Guo – Mentor (TU Delft - Mechanical Engineering)
R.L. Lagendijk – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Backpropagation-free learning rules depict an affinity towards neuromorphic and energy constrained hardware, yet the final representations that they learn remain not well understood. We dive deep on two local Hebbian rules that appear to compute distinct objectives: (i) Oja’s rule computes the first principal component; (ii) SoftHebb extends it to a soft winner-take-all network whose fixed points are normalized component means. In the batch setting, Ding and He (2004) have shown that K-means and PCA are strongly related, that is, the subspace spanned by the cluster centroids coincides with the span of the first K − 1 principal directions of the data covariance. We analyze if the same correspondence survives sample by sample in a streaming setting, where updates are noisy and the weight vectors are renormalized. As such, we first provide a self contained fixed-point analysis, which we are going to use it as the common lens for both rules. Second, on controlled two dimensional Gaussian data, we assess some geometric conditions under the rules agree or disagree, yielding an actionable criterion for predicting, on a given dataset, whether the rules converge to the same representation. Third, we show the disagreement is not as the naive picture suggests, that is, an expected divergence does not hold and is replaced with a quantitative account depicted by a ratio of the cluster width to the inter cluster offset.