Learning Curves with Little Data

None, None

doi:10.4233/uuid:0301eb6e-9aef-4ce6-8319-14c311c008d5

Learning Curves with Little Data

Doctoral Thesis (2026)

Author(s)

O.T. Turan (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M.J.T. Reinders – Promotor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

M. Loog – Promotor (Radboud Universiteit Nijmegen, TU Delft - Electrical Engineering, Mathematics and Computer Science)

D.M.J. Tax – Copromotor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group

Pattern Recognition and Bioinformatics

Learning curve Meta-learning Generalization performance Data-scarcity

DOI related publication

https://doi.org/10.4233/uuid:0301eb6e-9aef-4ce6-8319-14c311c008d5 Final published version

To reference this document use

https://doi.org/10.4233/uuid:0301eb6e-9aef-4ce6-8319-14c311c008d5

More Info

expand_more

Publication Year

2026

Language

English

Defense Date

07-07-2026

Awarding Institution

Delft University of Technology

Abstract

Obtaining data is often costly, making it important to assess whether collecting additional data is justified by the expected improvement in performance. Learning curves, which describe the expected performance of a learner as a function of dataset size, provide a useful tool for this purpose. They can help practitioners assess whether further data collection is justified by the anticipated gains. However, additional data does not always lead to improved performance, which makes estimating the potential benefit challenging. Under such conditions, model selection also becomes more difficult, as it is unclear how to compare models when no data is available to evaluate their performance at a hypothetical training set size.

In this context, the thesis takes a step back and asks a more fundamental question: how can we reliably reason about generalization when data is scarce and the behavior of learning curves is itself uncertain? Rather than treating learning curves as simple, and monotonic functions, we study their full statistical structure. We show that variability across training subsets can influence model comparison, decision making, and performance extrapolation. In addition, we investigate conditions under which monotonic improvement can be guaranteed or encouraged. Beyond single task learning, we also examine meta-learning, where information from multiple related tasks is leveraged to improve generalization performance while reducing the amount of data required from any individual task.

We begin by showing that the mean, as a statistical summary of learning curves, may not provide a reliable estimate of performance. We demonstrate that generalization performance distributions are often skewed and heavy tailed, regardless of how they are obtained. As a result, relying solely on the mean for model selection can be suboptimal for some problems.

Next, we propose a semi parametric extrapolation method that adapts its inductive bias to capture complex and potentially non monotonic patterns. This approach improves predictive reliability in settings where additional data collection is costly or infeasible and where learning curves may not exhibit monotonic behavior.

We then study the monotonicity of learning curves under specific conditions. For linear regression, we show that a single gradient update is sufficient to ensure monotonic improvement, provided that the learning rate does not exceed a certain threshold. To construct similarly monotonic learners in practice, we propose a data driven approach for selecting both the learning rate and the initial parameter estimates.

Finally, we investigate the learning curves of a meta learning algorithm. Through controlled synthetic experiments, we analyze the generalization performance of both meta learners and task specific learners, providing insights into how properties of the task distribution influence generalization under a limited adaptation stage consisting of a single gradient update.

Files

Taylanturan_dissertation.pdf

(pdf | 5.45 Mb)

License info not available

Taylanturan_propositions.pdf

(pdf | 0.0142 Mb)

License info not available