On Sample-Wise Strict Monotonicity with a Gradient Update
O. Taylan Turan (TU Delft - Pattern Recognition and Bioinformatics)
Marco Loog (Radboud University Nijmegen)
David M.J. Tax (TU Delft - Pattern Recognition and Bioinformatics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Learning curves describe how the performance of a model evolves with increasing training data. Although more data is generally expected to improve model performance, in practice models can exhibit non-monotonic behavior where additional data leads to performance degradation. Sample-wise double descent is one particular example. We address the question of how a learner can have a provably monotone learning curve. For isotropic Gaussian covariates under a Gaussian noise model and a linear predictor, we prove that a single step of steepest descent guarantees sample-wise monotonicity in the learning curve, if the step size does not exceed an upper bound. Furthermore, we present a practical procedure that ensures monotonicity without explicit regularization or cross-validation, using initialization from the previous training set size. Experiments on real-world datasets demonstrate that this method achieves monotone behavior and improved sample efficiency compared to ordinary least squares and optimally regularized ridge regression. We also explore extensions to binary classification, where monotonicity depends on the chosen performance metric. While our guarantees are derived under simplifying assumptions, they provide both theoretical and practical insights for constructing monotone learners and for understanding and mitigating sample-wise double descent behavior.