Deep limits of residual neural networks

Journal Article (2023)
Author(s)

Matthew Thorpe (The University of Manchester, The Alan Turing Institute)

Yves Van Gennip (TU Delft - Mathematical Physics)

Research Group
Mathematical Physics
Copyright
© 2023 Matthew Thorpe, Y. van Gennip
DOI related publication
https://doi.org/10.1007/s40687-022-00370-y
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Matthew Thorpe, Y. van Gennip
Related content
Research Group
Mathematical Physics
Issue number
1
Volume number
10
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Neural networks have been very successful in many applications; we often, however, lack a theoretical understanding of what the neural networks are actually learning. This problem emerges when trying to generalise to new data sets. The contribution of this paper is to show that, for the residual neural network model, the deep layer limit coincides with a parameter estimation problem for a nonlinear ordinary differential equation. In particular, whilst it is known that the residual neural network model is a discretisation of an ordinary differential equation, we show convergence in a variational sense. This implies that optimal parameters converge in the deep layer limit. This is a stronger statement than saying for a fixed parameter the residual neural network model converges (the latter does not in general imply the former). Our variational analysis provides a discrete-to-continuum Γ -convergence result for the objective function of the residual neural network training step to a variational problem constrained by a system of ordinary differential equations; this rigorously connects the discrete setting to a continuum problem.