Accurately predicting a machine learning model’s final performance based on only partial training data can save substantial computational resources and guide early stopping, model selection, and automated machine learning (AutoML) workflows. Learning Curve Prior-Fitted Networks (
...
Accurately predicting a machine learning model’s final performance based on only partial training data can save substantial computational resources and guide early stopping, model selection, and automated machine learning (AutoML) workflows. Learning Curve Prior-Fitted Networks (LC-PFNs) are a recent data-driven approach to this problem, leveraging transformers trained on prior learning curves to make extrapolations. However, real-world training logs are often noisy and irregular—conditions under which the reliability of LC-PFNs remains largely untested. This thesis presents a systematic investigation into the robustness of LC-PFNs when exposed to noisy input data. Using LCDB 1.1—a large-scale dataset—we simulate various levels of noise by corrupting the input data with Gaussian perturbations and quantify how prediction accuracy degrades. To improve resilience, we study and propose two complementary mitigation strategies. The first injects artificial noise into the training data itself, either at a constant level or increasing gradually throughout training, encouraging the model to generalize across a spectrum of noise conditions. The second applies post-processing techniques at inference time, such as smoothing the input sequence with an exponential moving average or averaging multiple stochastic predictions using dropout. Our results show that standard LC-PFNs, trained only on clean data, are highly sensitive to even minor corruptions. In contrast, models trained with gradual noise exposure and evaluated with input smoothing achieve much greater robustness—reducing error by up to 75% under severe noise—while maintaining high accuracy in noise-free settings. This work demonstrates that substantial improvements in learning-curve extrapolation can be achieved without modifying model architecture, using general purpose techniques suitable for real-world deployment.