Sharpness-Aware Optimization for Stability Gap Reduction

None, None

Sharpness-Aware Optimization for Stability Gap Reduction

Bachelor Thesis (2025)

Author(s)

K. Sycheva (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

G.M. van de Ven – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Tom Julian Viering – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

A Hanjalic – Graduation committee member (TU Delft - Intelligent Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

To reference this document use:

https://resolver.tudelft.nl/uuid:8ef9c7b5-7474-48b8-82e4-a7e5232e3e37

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

27-06-2025

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

One of the problems in continual learning, where models are trained sequentially on tasks, is a sudden drop in performance after switching to a new task, called stability gap. The presence of stability gap likely indicates that training is not done optimally. In this work we aim to address stability gap problem by using sharpness-aware optimization that biases convergence to flat minima. While flat minima are known to mitigate forgetting, their role in ensuring stable learning during task transitions remains unexplored. Through systematic analysis of two Entropy-SGD and C-Flat, we demonstrate that sharpness-aware optimizers produce smoother learning trajectories with reduced instability after task switch. Furthermore, we show that C-Flat’s second-order curvature approximation provides additional stabilization, suggesting that efficient Hessian-aware methods offer advantages for continual learning. The source code is available at Stability-Gap-SAM.

Files

Research_paper-13.pdf

(pdf | 1.93 Mb)

License info not available