Stability Gap in Continual Learning: The Role of Learning Rate

Bachelor Thesis (2025)
Author(s)

P.K. Sobocińska (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Tom Julian Viering – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

G.M. van de Ven – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

A Hanjalic – Graduation committee member (TU Delft - Intelligent Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
25-06-2025
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Continual learning aims to enable neural networks to acquire new knowledge sequentially without forgetting what they have already learned. While many strategies have been developed to address catastrophic forgetting, a subtler challenge known as the stability gap—a temporary drop in performance immediately after switching tasks—remains insufficiently understood. Recent work suggests that the learning rate may influence this phenomenon by shaping the model’s optimization trajectory. This paper systematically investigates how different constant learning rates affect the stability gap and whether dynamic learning rate scheduling can mitigate it. Experiments on Rotated MNIST with perfect replay show that smaller constant learning rates reduce the immediate drop but slow down recovery and convergence, while larger rates yield higher final accuracy but at the cost of a more severe gap. Scheduling methods, including CyclicLR and our custom IncreaseLROnPlateau, demonstrate potential for balancing this trade-off, but also introduce new challenges such as intra-task fluctuations. Overall, a carefully tuned constant learning rate provides the most robust trade-off in this setting. By isolating and quantifying these effects, this work offers insights for selecting and tuning learning rates in continual learning and lays the groundwork for future studies on more effective scheduling strategies. All code and experiments are publicly available at: https://github.com/wjssk/learning-rate-in-stability-gap.

Files

LR_in_Stability_Gap_32_.pdf
(pdf | 3.03 Mb)
License info not available