Hybrid Equilibrium Propagation: Trading Estimator Bias for Compute
Ștefan Stoian (TU Delft - Electrical Engineering, Mathematics and Computer Science)
S. Tan – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Y. Guo – Mentor (TU Delft - Mechanical Engineering)
R.L. Lagendijk – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Equilibrium Propagation (EP) is a backpropagation-free learning algorithm for energy-based networks; its standard estimator computes the gradient by comparing the equilibrium states reached in a free phase and a single nudged phase, but carries a bias that limits how closely EP can match backpropagation. The centered estimator reduces this bias and improves accuracy, but adds a second nudged phase per update, raising the training cost. To balance accuracy against compute, we introduce hybrid EP, a family of estimators that mix the standard and centered updates on a per-batch basis, and show analytically that the mixing probability controls this bias, so that annealing it interpolates between the two regimes. We evaluate three hybrid schedules - a cosine anneal, its inverse, and a fixed stochastic mix - against standard and centered EP on MNIST, Fashion-MNIST, and CIFAR-10, in order of increasing complexity. On the easier tasks the hybrids match centered EP at lower compute. On CIFAR-10 standard EP collapses to near-chance accuracy, and the cosine and inverse schedules collapse with it: each concentrates its biased updates into one long stretch, whereas only the stochastic mix, which spreads the same biased updates evenly across batches, trains stably. The compute savings of hybrid EP are therefore real but task-dependent: they are realized most cleanly when standard EP is itself viable, and training stability is governed not by the number of biased updates but by their distribution over the course of training.