Reconciling grokking with statistical learning theory through the lens of norm- and stability-based generalization bounds

Journal Article (2026)
Author(s)

Luca Oneto (Università degli Studi di Genova)

Sandro Ridella (Università degli Studi di Genova)

Simone Minisi (Università degli Studi di Genova)

Andrea Coraddu (TU Delft - Sustainable Drive and Energy System)

Davide Anguita (Università degli Studi di Genova)

Research Group
Sustainable Drive and Energy System
DOI related publication
https://doi.org/10.1016/j.neucom.2026.132826
More Info
expand_more
Publication Year
2026
Language
English
Research Group
Sustainable Drive and Energy System
Volume number
674
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In recent years, Artificial Intelligence, particularly Machine Learning, has achieved remarkable success in solving complex problems. However, this progress has also revealed the emergence of unexpected, poorly understood, and elusive phenomena that characterize the behavior of machine intelligence and learning processes. These phenomena often challenge researchers to interpret them within the boundaries of existing Machine Learning theoretical frameworks, thereby motivating the development of new and more comprehensive theoretical foundations. One such phenomenon, known as grokking, refers to the sudden and substantial improvement in a model's performance following a prolonged period of stagnant or even regressive learning. In this paper, we argue that it is possible to provide insights into grokking by leveraging the existing theoretical foundations of Machine Learning, in particular concepts from Statistical Learning Theory, such as norm-based and stability-based generalization bounds. We further show how these theories can help reconcile the phenomenon of grokking with established principles of learning and generalization. Furthermore, we demonstrate the practical applicability of these insights through concrete examples.