Training Dynamics of Overparameterized Neural Networks

None, None

Training Dynamics of Overparameterized Neural Networks

Master Thesis (2026)

Author(s)

S.D. Kalvankar (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

D.M.J. Tax – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A. Heinlein – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A. Papapantoleon – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Spectral Bias in Neural Networks Neural Tangent Kernel (NTK) Fourier Analysis of Learning Dynamics

To reference this document use

https://resolver.tudelft.nl/uuid:c9e6188a-a47f-489a-ae88-2b0b23c3b20c

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

12-06-2026

Awarding Institution

Delft University of Technology

Programme

Computer Science

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

69

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In this thesis, we study spectral bias, the tendency of gradient-based training to learn the low-frequency part of a target before its high-frequency part. We work in a setting simple enough to analyse explicitly: regression on the unit circle with a shallow
ReLU network. In the infinite-width limit, the residual dynamics are governed by the Neural Tangent Kernel. Under the uniform measure on the circle this kernel depends only on the angle between two points, so the associated operator is a convolution and the Fourier modes are its eigenfunctions, each decaying at a rate set by its eigenvalue, and the lower the frequency, the larger the eigenvalue, so low frequencies are learned first.

Away from this idealised limit the picture degrades only gradually. On a fixed low-frequency subspace, both finite sampling and frozen finite width keep the operator close to the continuum Fourier prediction, with error of order O(n^(-1/2)) in the sample size n and O(m^(-1/2)) in the width m. The description breaks only once the kernel is allowed to evolve during training. At small width the evolving kernel reaches a lower loss by strengthening its lowest-frequency components, even as its alignment with the Fourier basis fails to improve. This reinforces the low-frequency bias rather than approximating the fixed-kernel dynamics. A formal theory of this evolving-kernel regime remains the main open problem.

Files

Training_Dynamics_of_Overparam... (pdf)

(pdf | 32 Mb)

License info not available