Learning Curves with Little Data

Doctoral Thesis (2026)
Author(s)

O.T. Turan (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M.J.T. Reinders – Promotor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

M. Loog – Promotor (Radboud Universiteit Nijmegen, TU Delft - Electrical Engineering, Mathematics and Computer Science)

D.M.J. Tax – Copromotor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group
Pattern Recognition and Bioinformatics
DOI related publication
https://doi.org/10.4233/uuid:0301eb6e-9aef-4ce6-8319-14c311c008d5 Final published version
More Info
expand_more
Publication Year
2026
Language
English
Defense Date
07-07-2026
Awarding Institution
Delft University of Technology
Related content
Research Group
Pattern Recognition and Bioinformatics
ISBN (print)
978-94-6384-981-4
Downloads counter
24
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Obtaining data is often costly, making it important to assess whether collecting additional data is justified by the expected improvement in performance. Learning curves, which describe the expected performance of a learner as a function of dataset size, provide a useful tool for this purpose. They can help practitioners assess whether further data collection is justified by the anticipated gains. However, additional data does not always lead to improved performance, which makes estimating the potential benefit challenging. Under such conditions, model selection also becomes more difficult, as it is unclear how to compare models when no data is available to evaluate their performance at a hypothetical training set size.

In this context, the thesis takes a step back and asks a more fundamental question: how can we reliably reason about generalization when data is scarce and the behavior of learning curves is itself uncertain? Rather than treating learning curves as simple, and monotonic functions, we study their full statistical structure. We show that variability across training subsets can influence model comparison, decision making, and performance extrapolation. In addition, we investigate conditions under which monotonic improvement can be guaranteed or encouraged. Beyond single task learning, we also examine meta-learning, where information from multiple related tasks is leveraged to improve generalization performance while reducing the amount of data required from any individual task.

We begin by showing that the mean, as a statistical summary of learning curves, may not provide a reliable estimate of performance. We demonstrate that generalization performance distributions are often skewed and heavy tailed, regardless of how they are obtained. As a result, relying solely on the mean for model selection can be suboptimal for some problems.

Next, we propose a semi parametric extrapolation method that adapts its inductive bias to capture complex and potentially non monotonic patterns. This approach improves predictive reliability in settings where additional data collection is costly or infeasible and where learning curves may not exhibit monotonic behavior.

We then study the monotonicity of learning curves under specific conditions. For linear regression, we show that a single gradient update is sufficient to ensure monotonic improvement, provided that the learning rate does not exceed a certain threshold. To construct similarly monotonic learners in practice, we propose a data driven approach for selecting both the learning rate and the initial parameter estimates.

Finally, we investigate the learning curves of a meta learning algorithm. Through controlled synthetic experiments, we analyze the generalization performance of both meta learners and task specific learners, providing insights into how properties of the task distribution influence generalization under a limited adaptation stage consisting of a single gradient update.

Files

License info not available
Taylanturan_propositions.pdf
(pdf | 0.0142 Mb)
License info not available