In the context of continual learning, recent work has identified a significant and recurring perfor- mance drop, followed by a gradual recovery, upon the introduction of a new task. This phenomenon is referred to as the stability gap. Investigating it and the potential solutions
...
In the context of continual learning, recent work has identified a significant and recurring perfor- mance drop, followed by a gradual recovery, upon the introduction of a new task. This phenomenon is referred to as the stability gap. Investigating it and the potential solutions is essential, as such findings can reduce both the energy consumption and computational time required to prepare a high- performing agent. Given the strong influence of training procedures on model performance and sta- bility, we analyze how various optimizers –SGD, NAG, AdaGrad, RMSprop, Adam– and momentum values affect the stability gap. We expose a deep neural network to a sequence of digit-identification tasks with varying rotations, and track several met- rics to capture the components of the stability gap and the overall performance. Our results reveal that increasing momentum amplifies the steepness and depth of the gap, while shortening its duration. Within this simplified setup, RMSprop proves most effective in reducing the magnitude and duration of the drop while maintaining high overall perfor- mance.