Stability Gap in Continual Learning: The Role of Learning Rate

None, None

Stability Gap in Continual Learning: The Role of Learning Rate

Bachelor Thesis (2025)

Author(s)

P.K. Sobocińska (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

T.J. Viering – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

G.M. van de Ven – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

A. Hanjalic – Graduation committee member (TU Delft - Intelligent Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

To reference this document use:

https://resolver.tudelft.nl/uuid:5a8394ca-fbd5-410d-9ca3-7a9c3c695f84

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

25-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Continual learning aims to enable neural networks to acquire new knowledge sequentially without forgetting what they have already learned. While many strategies have been developed to address catastrophic forgetting, a subtler challenge known as the stability gap—a temporary drop in performance immediately after switching tasks—remains insufficiently understood. Recent work suggests that the learning rate may influence this phenomenon by shaping the model’s optimization trajectory. This paper systematically investigates how different constant learning rates affect the stability gap and whether dynamic learning rate scheduling can mitigate it. Experiments on Rotated MNIST with perfect replay show that smaller constant learning rates reduce the immediate drop but slow down recovery and convergence, while larger rates yield higher final accuracy but at the cost of a more severe gap. Scheduling methods, including CyclicLR and our custom IncreaseLROnPlateau, demonstrate potential for balancing this trade-off, but also introduce new challenges such as intra-task fluctuations. Overall, a carefully tuned constant learning rate provides the most robust trade-off in this setting. By isolating and quantifying these effects, this work offers insights for selecting and tuning learning rates in continual learning and lays the groundwork for future studies on more effective scheduling strategies. All code and experiments are publicly available at: https://github.com/wjssk/learning-rate-in-stability-gap.

Files

LR_in_Stability_Gap_32_.pdf

(pdf | 3.03 Mb)

License info not available