SK
S.D. Kalvankar
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
1 records found
1
In this thesis, we study spectral bias, the tendency of gradient-based training to learn the low-frequency part of a target before its high-frequency part. We work in a setting simple enough to analyse explicitly: regression on the unit circle with a shallow
ReLU network. In the infinite-width limit, the residual dynamics are governed by the Neural Tangent Kernel. Under the uniform measure on the circle this kernel depends only on the angle between two points, so the associated operator is a convolution and the Fourier modes are its eigenfunctions, each decaying at a rate set by its eigenvalue, and the lower the frequency, the larger the eigenvalue, so low frequencies are learned first.
Away from this idealised limit the picture degrades only gradually. On a fixed low-frequency subspace, both finite sampling and frozen finite width keep the operator close to the continuum Fourier prediction, with error of order O(n^(-1/2)) in the sample size n and O(m^(-1/2)) in the width m. The description breaks only once the kernel is allowed to evolve during training. At small width the evolving kernel reaches a lower loss by strengthening its lowest-frequency components, even as its alignment with the Fourier basis fails to improve. This reinforces the low-frequency bias rather than approximating the fixed-kernel dynamics. A formal theory of this evolving-kernel regime remains the main open problem. ...
ReLU network. In the infinite-width limit, the residual dynamics are governed by the Neural Tangent Kernel. Under the uniform measure on the circle this kernel depends only on the angle between two points, so the associated operator is a convolution and the Fourier modes are its eigenfunctions, each decaying at a rate set by its eigenvalue, and the lower the frequency, the larger the eigenvalue, so low frequencies are learned first.
Away from this idealised limit the picture degrades only gradually. On a fixed low-frequency subspace, both finite sampling and frozen finite width keep the operator close to the continuum Fourier prediction, with error of order O(n^(-1/2)) in the sample size n and O(m^(-1/2)) in the width m. The description breaks only once the kernel is allowed to evolve during training. At small width the evolving kernel reaches a lower loss by strengthening its lowest-frequency components, even as its alignment with the Fourier basis fails to improve. This reinforces the low-frequency bias rather than approximating the fixed-kernel dynamics. A formal theory of this evolving-kernel regime remains the main open problem. ...
In this thesis, we study spectral bias, the tendency of gradient-based training to learn the low-frequency part of a target before its high-frequency part. We work in a setting simple enough to analyse explicitly: regression on the unit circle with a shallow
ReLU network. In the infinite-width limit, the residual dynamics are governed by the Neural Tangent Kernel. Under the uniform measure on the circle this kernel depends only on the angle between two points, so the associated operator is a convolution and the Fourier modes are its eigenfunctions, each decaying at a rate set by its eigenvalue, and the lower the frequency, the larger the eigenvalue, so low frequencies are learned first.
Away from this idealised limit the picture degrades only gradually. On a fixed low-frequency subspace, both finite sampling and frozen finite width keep the operator close to the continuum Fourier prediction, with error of order O(n^(-1/2)) in the sample size n and O(m^(-1/2)) in the width m. The description breaks only once the kernel is allowed to evolve during training. At small width the evolving kernel reaches a lower loss by strengthening its lowest-frequency components, even as its alignment with the Fourier basis fails to improve. This reinforces the low-frequency bias rather than approximating the fixed-kernel dynamics. A formal theory of this evolving-kernel regime remains the main open problem.
ReLU network. In the infinite-width limit, the residual dynamics are governed by the Neural Tangent Kernel. Under the uniform measure on the circle this kernel depends only on the angle between two points, so the associated operator is a convolution and the Fourier modes are its eigenfunctions, each decaying at a rate set by its eigenvalue, and the lower the frequency, the larger the eigenvalue, so low frequencies are learned first.
Away from this idealised limit the picture degrades only gradually. On a fixed low-frequency subspace, both finite sampling and frozen finite width keep the operator close to the continuum Fourier prediction, with error of order O(n^(-1/2)) in the sample size n and O(m^(-1/2)) in the width m. The description breaks only once the kernel is allowed to evolve during training. At small width the evolving kernel reaches a lower loss by strengthening its lowest-frequency components, even as its alignment with the Fourier basis fails to improve. This reinforces the low-frequency bias rather than approximating the fixed-kernel dynamics. A formal theory of this evolving-kernel regime remains the main open problem.