Stability Gap in Continual Learning: The Role of Learning Rate
P.K. Sobocińska (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Tom Julian Viering – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
G.M. van de Ven – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
A Hanjalic – Graduation committee member (TU Delft - Intelligent Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Continual learning aims to enable neural networks to acquire new knowledge sequentially without forgetting what they have already learned. While many strategies have been developed to address catastrophic forgetting, a subtler challenge known as the stability gap—a temporary drop in performance immediately after switching tasks—remains insufficiently understood. Recent work suggests that the learning rate may influence this phenomenon by shaping the model’s optimization trajectory. This paper systematically investigates how different constant learning rates affect the stability gap and whether dynamic learning rate scheduling can mitigate it. Experiments on Rotated MNIST with perfect replay show that smaller constant learning rates reduce the immediate drop but slow down recovery and convergence, while larger rates yield higher final accuracy but at the cost of a more severe gap. Scheduling methods, including CyclicLR and our custom IncreaseLROnPlateau, demonstrate potential for balancing this trade-off, but also introduce new challenges such as intra-task fluctuations. Overall, a carefully tuned constant learning rate provides the most robust trade-off in this setting. By isolating and quantifying these effects, this work offers insights for selecting and tuning learning rates in continual learning and lays the groundwork for future studies on more effective scheduling strategies. All code and experiments are publicly available at: https://github.com/wjssk/learning-rate-in-stability-gap.