C. Yan | TU Delft Repository

Extrapolating Learning Curves: When Do Neural Networks Outperform Parametric Models?

Bachelor thesis (2025) - A. Cazacu (author) , Tom Julian Viering (mentor) , C. Yan (mentor) , S. Mukherjee (mentor) , MTJ Spaan (graduation committee member)

Learning curve extrapolation helps practitioners predict model performance at larger data scales, enabling better planning for data collection and computational resource allocation. This paper investigates when neural networks outperform parametric models for this task. We conduc ...

How Noisy Is Too Noisy?

Robust Extrapolation of Learning Curves with LC-PFN

Bachelor thesis (2025) - R.M. Gherasa (author) , C. Yan (mentor) , S. Mukherjee (mentor) , Tom Julian Viering (mentor) , MTJ Spaan (graduation committee member)

Accurately predicting a machine learning model’s final performance based on only partial training data can save substantial computational resources and guide early stopping, model selection, and automated machine learning (AutoML) workflows. Learning Curve Prior-Fitted Networks ( ...

The Impact of Imbalanced Training Data on Learning Curve Prior-Fitted Networks

Bachelor thesis (2025) - B. Kostov (author) , C. Yan (mentor) , S. Mukherjee (mentor) , Tom Julian Viering (mentor) , MTJ Spaan (graduation committee member)

Learning curves represent the relationship between the amount of training data and the error rate in machine learning. An important use case for learning curves is extrapolating them in order to predict how much data is needed to achieve a certain performance. One way to do such ...

The Effect of Domain Shift on Learning Curve Extrapolation

Bachelor thesis (2025) - M. Soeters (author) , Tom Julian Viering (mentor) , C. Yan (mentor) , S. Mukherjee (mentor) , MTJ Spaan (graduation committee member)

Domain shift is when the distribution of data differs between the training of a model and its testing. This can happen when the conditions of training are slightly different from the conditions that will happen when a model is tested or used. This is a problem for generalizabilit ...

Effectiveness of Machine Learning Models in Classifying Learners Based on Learning Curves

Improving Our Understanding of Learning Curves Through the Process of Classification

Bachelor thesis (2025) - S. Basaran (author) , C. Yan (mentor) , S. Mukherjee (mentor) , Tom Julian Viering (mentor) , MTJ Spaan (graduation committee member)

In machine learning, learning curves are a metric that plots performance versus training set size. They inform decisions about data acquisition, model selection, and hyperparameter tuning. Despite their importance, recent research suggests that our understanding of learning curve ...

How does scaling a learning curve influence the curve fitting process?

Bachelor thesis (2025) - C. van den Oudenhoven (author) , O.T. Turan (mentor) , C. Yan (mentor) , Tom Viering (mentor) , Arie van Deursen (graduation committee member)

Learning curves show the learning rate of a clas- sifier by plotting the dataset size used to train the classifier versus the error rate. By extrapolating these curves it is possible to predict how well the classifier will perform when trained on dataset sizes that are currently ...

What is the effect of Gaussian filtering applied before curve fitting?

Bachelor thesis (2025) - I. Moanta (author) , Tom Viering (mentor) , O.T. Turan (mentor) , C. Yan (mentor) , Arie van Deursen (graduation committee member)

Learning curves are graphical representations of the relationship between dataset size and error rate in machine learning. Curve fitting is the process of estimating a learning curve using a mathematical formula. This paper analyzes two ways of performing curve fitting: interpola ...

Starting Right: Exploring the impact of random distribution sampling on initial Parameter selection for curve fitting

Bachelor thesis (2025) - D. Darie (author) , O.T. Turan (mentor) , Tom Viering (mentor) , C. Yan (mentor) , Arie van Deursen (graduation committee member)

Learning curves are used to evaluate the perfor- mance of a machine learning (ML) model with respect to the amount of data used when train- ing. Curve fitting finds the unknown optimal co- efficients by minimizing the error prediction for a learning curve. This research analyzed ...