CY
C. Yan
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
8 records found
1
Learning curve extrapolation helps practitioners predict model performance at larger data scales, enabling better planning for data collection and computational resource allocation. This paper investigates when neural networks outperform parametric models for this task. We conduct a comprehensive comparison of LC-PFNs (Learning Curve Prior-Fitted Networks) and three established parametric models (POW4, MMF4, WBL4) using LCDB v1.1, a large-scale dataset with learning curves generated across 265 classification tasks and 24 learners. Surprisingly, we find that parametric models — especially POW4 and MMF4 — consistently outperform LC-PFN across all generalization scenarios and most cutoff regions. However, LC-PFN demonstrates competitive performance when extrapolating from early-stage data, ranking second-best at 10%, 30%, and 50% cutoffs. This suggests LC-PFNs can be valuable when only a small fraction of the learning curve is available. LC-PFN is particularly challenged by smooth and flat curves, but shows slightly improved performance on irregular patterns such as peaking and dipping curves, though it remains outperformed by all parametric models. These trends highlight a misalignment between LC-PFN’s training distribution and the real-world diversity of learning curves. Our findings emphasize the strength of parametric models under realistic conditions and suggest avenues for improving LC-PFNs through architectural flexibility and curve length variability during training.
...
Learning curve extrapolation helps practitioners predict model performance at larger data scales, enabling better planning for data collection and computational resource allocation. This paper investigates when neural networks outperform parametric models for this task. We conduct a comprehensive comparison of LC-PFNs (Learning Curve Prior-Fitted Networks) and three established parametric models (POW4, MMF4, WBL4) using LCDB v1.1, a large-scale dataset with learning curves generated across 265 classification tasks and 24 learners. Surprisingly, we find that parametric models — especially POW4 and MMF4 — consistently outperform LC-PFN across all generalization scenarios and most cutoff regions. However, LC-PFN demonstrates competitive performance when extrapolating from early-stage data, ranking second-best at 10%, 30%, and 50% cutoffs. This suggests LC-PFNs can be valuable when only a small fraction of the learning curve is available. LC-PFN is particularly challenged by smooth and flat curves, but shows slightly improved performance on irregular patterns such as peaking and dipping curves, though it remains outperformed by all parametric models. These trends highlight a misalignment between LC-PFN’s training distribution and the real-world diversity of learning curves. Our findings emphasize the strength of parametric models under realistic conditions and suggest avenues for improving LC-PFNs through architectural flexibility and curve length variability during training.
How Noisy Is Too Noisy?
Robust Extrapolation of Learning Curves with LC-PFN
Accurately predicting a machine learning model’s final performance based on only partial training data can save substantial computational resources and guide early stopping, model selection, and automated machine learning (AutoML) workflows. Learning Curve Prior-Fitted Networks (LC-PFNs) are a recent data-driven approach to this problem, leveraging transformers trained on prior learning curves to make extrapolations. However, real-world training logs are often noisy and irregular—conditions under which the reliability of LC-PFNs remains largely untested. This thesis presents a systematic investigation into the robustness of LC-PFNs when exposed to noisy input data. Using LCDB 1.1—a large-scale dataset—we simulate various levels of noise by corrupting the input data with Gaussian perturbations and quantify how prediction accuracy degrades. To improve resilience, we study and propose two complementary mitigation strategies. The first injects artificial noise into the training data itself, either at a constant level or increasing gradually throughout training, encouraging the model to generalize across a spectrum of noise conditions. The second applies post-processing techniques at inference time, such as smoothing the input sequence with an exponential moving average or averaging multiple stochastic predictions using dropout. Our results show that standard LC-PFNs, trained only on clean data, are highly sensitive to even minor corruptions. In contrast, models trained with gradual noise exposure and evaluated with input smoothing achieve much greater robustness—reducing error by up to 75% under severe noise—while maintaining high accuracy in noise-free settings. This work demonstrates that substantial improvements in learning-curve extrapolation can be achieved without modifying model architecture, using general purpose techniques suitable for real-world deployment.
...
Accurately predicting a machine learning model’s final performance based on only partial training data can save substantial computational resources and guide early stopping, model selection, and automated machine learning (AutoML) workflows. Learning Curve Prior-Fitted Networks (LC-PFNs) are a recent data-driven approach to this problem, leveraging transformers trained on prior learning curves to make extrapolations. However, real-world training logs are often noisy and irregular—conditions under which the reliability of LC-PFNs remains largely untested. This thesis presents a systematic investigation into the robustness of LC-PFNs when exposed to noisy input data. Using LCDB 1.1—a large-scale dataset—we simulate various levels of noise by corrupting the input data with Gaussian perturbations and quantify how prediction accuracy degrades. To improve resilience, we study and propose two complementary mitigation strategies. The first injects artificial noise into the training data itself, either at a constant level or increasing gradually throughout training, encouraging the model to generalize across a spectrum of noise conditions. The second applies post-processing techniques at inference time, such as smoothing the input sequence with an exponential moving average or averaging multiple stochastic predictions using dropout. Our results show that standard LC-PFNs, trained only on clean data, are highly sensitive to even minor corruptions. In contrast, models trained with gradual noise exposure and evaluated with input smoothing achieve much greater robustness—reducing error by up to 75% under severe noise—while maintaining high accuracy in noise-free settings. This work demonstrates that substantial improvements in learning-curve extrapolation can be achieved without modifying model architecture, using general purpose techniques suitable for real-world deployment.
Learning curves represent the relationship between the amount of training data and the error rate in machine learning. An important use case for learning curves is extrapolating them in order to predict how much data is needed to achieve a certain performance. One way to do such extrapolations is using Deep Learning with a Prior-Fitted Network(PFN). This paper explores how training the PFN on an imbalanced dataset, i.e. containing learning curves from two or more machine learning models with a skewed distribution, affects the performance of the network. Research into imbalanced learning has shown that machine learning models can favor the more prevalent classes or data. Therefore, it is worthwhile to explore whether such trends can occur for the neural networks that we train for learning curve extrapolation. Our experiments focused on analyzing different imbalance scenarios and comparing them. Our results show that mixing learning curves from different learners can improve extrapolation performance in some cases, but the effect strongly depends on the learner characteristics and training proportions.
...
Learning curves represent the relationship between the amount of training data and the error rate in machine learning. An important use case for learning curves is extrapolating them in order to predict how much data is needed to achieve a certain performance. One way to do such extrapolations is using Deep Learning with a Prior-Fitted Network(PFN). This paper explores how training the PFN on an imbalanced dataset, i.e. containing learning curves from two or more machine learning models with a skewed distribution, affects the performance of the network. Research into imbalanced learning has shown that machine learning models can favor the more prevalent classes or data. Therefore, it is worthwhile to explore whether such trends can occur for the neural networks that we train for learning curve extrapolation. Our experiments focused on analyzing different imbalance scenarios and comparing them. Our results show that mixing learning curves from different learners can improve extrapolation performance in some cases, but the effect strongly depends on the learner characteristics and training proportions.
Effectiveness of Machine Learning Models in Classifying Learners Based on Learning Curves
Improving Our Understanding of Learning Curves Through the Process of Classification
In machine learning, learning curves are a metric that plots performance versus training set size. They inform decisions about data acquisition, model selection, and hyperparameter tuning. Despite their importance, recent research suggests that our understanding of learning curve behavior remains limited. In this work, we investigate learning curves from a classification perspective to better understand their structural properties. By framing learning curves as time series and applying time series classification (TSC) techniques, we uncover several key findings: (1) training accuracy curves are significantly more distinguishable across models than validation or test curves; (2) learning curves become more informative and discriminative after a sufficient number of anchor points; and (3) TSC models that emphasize global structural features outperform those focused on local or pointwise characteristics. These results not only offer new insights into the nature of learning curves but also suggest promising directions for future work, including the development of specialized models that move beyond conventional time series assumptions.
...
In machine learning, learning curves are a metric that plots performance versus training set size. They inform decisions about data acquisition, model selection, and hyperparameter tuning. Despite their importance, recent research suggests that our understanding of learning curve behavior remains limited. In this work, we investigate learning curves from a classification perspective to better understand their structural properties. By framing learning curves as time series and applying time series classification (TSC) techniques, we uncover several key findings: (1) training accuracy curves are significantly more distinguishable across models than validation or test curves; (2) learning curves become more informative and discriminative after a sufficient number of anchor points; and (3) TSC models that emphasize global structural features outperform those focused on local or pointwise characteristics. These results not only offer new insights into the nature of learning curves but also suggest promising directions for future work, including the development of specialized models that move beyond conventional time series assumptions.
Domain shift is when the distribution of data differs between the training of a model and its testing. This can happen when the conditions of training are slightly different from the conditions that will happen when a model is tested or used. This is a problem for generalizability of a model. Learning curves are widely used in machine learning to predict how much data is needed when training a model. This paper will explore how domain shift impacts learning curve extrapolation using Learning Curve Prior Fitted Networks. We will explore the effect of domain shift on the performance of models while comparing different learners and groups of learners, thereby showing that domain shift is relevant to learning curve extrapolation and has a statistically significant impact on the accuracy of such extrapolations. We will also discuss how patterns like well-behavedness have an impact on this effect of domain shift, while also showing that is it not the full solution to predicting the effect.
...
Domain shift is when the distribution of data differs between the training of a model and its testing. This can happen when the conditions of training are slightly different from the conditions that will happen when a model is tested or used. This is a problem for generalizability of a model. Learning curves are widely used in machine learning to predict how much data is needed when training a model. This paper will explore how domain shift impacts learning curve extrapolation using Learning Curve Prior Fitted Networks. We will explore the effect of domain shift on the performance of models while comparing different learners and groups of learners, thereby showing that domain shift is relevant to learning curve extrapolation and has a statistically significant impact on the accuracy of such extrapolations. We will also discuss how patterns like well-behavedness have an impact on this effect of domain shift, while also showing that is it not the full solution to predicting the effect.
Learning curves are used to evaluate the perfor- mance of a machine learning (ML) model with respect to the amount of data used when train- ing. Curve fitting finds the unknown optimal co- efficients by minimizing the error prediction for a learning curve. This research analyzed the effect of parameter initialization on the performance of curve fitting. Our focus was on comparing the per- formance of sampling the initial parameters from 2 random distributions: uniform and normal on the curve fitting process for different parametric mod- els. Moreover, we looked into the effect of chang- ing the parameters for these 2 random distributions and drew conclusions about potential best initial guesses. Finally, we arrived at the conclusion that, after choosing parameters that maintain similar data dis- tribution, uniform and normal distribution sam- pling parameter initializations perform similarly during the curve-fitting process on learning curves. Moreover, our studies highlight the sensitivity of the Levenberg-Maquardt curve fitting method’s sensitivity to bad initial guesses.
...
Learning curves are used to evaluate the perfor- mance of a machine learning (ML) model with respect to the amount of data used when train- ing. Curve fitting finds the unknown optimal co- efficients by minimizing the error prediction for a learning curve. This research analyzed the effect of parameter initialization on the performance of curve fitting. Our focus was on comparing the per- formance of sampling the initial parameters from 2 random distributions: uniform and normal on the curve fitting process for different parametric mod- els. Moreover, we looked into the effect of chang- ing the parameters for these 2 random distributions and drew conclusions about potential best initial guesses. Finally, we arrived at the conclusion that, after choosing parameters that maintain similar data dis- tribution, uniform and normal distribution sam- pling parameter initializations perform similarly during the curve-fitting process on learning curves. Moreover, our studies highlight the sensitivity of the Levenberg-Maquardt curve fitting method’s sensitivity to bad initial guesses.
Learning curves are graphical representations of the relationship between dataset size and error rate in machine learning. Curve fitting is the process of estimating a learning curve using a mathematical formula. This paper analyzes two ways of performing curve fitting: interpolation and extrapolation. The accuracy of the curve-fitting procedure might be negatively influenced by the irregular shape of the curve and the presence of noise. Our study investigates the effects of the Gaussian filter on curve fitting and the potential to improve its performance. This is done by analyzing multiple values of the Gaussian filter's standard deviation parameter(Sigma) and also a wide variety of learning curves(both smooth and noisy ones). The main finding of this research states that the Gaussian filter can generate significant improvements in the extrapolation process, especially when it is applied to noisy curves. On the other hand, for the interpolation procedure, its impact is reduced, even negligible for smooth curves. An important takeaway from this paper is that selecting the most suitable pre-processing method for the type of curve analyzed might generate valuable findings in the field of learning curves used in machine learning.
...
Learning curves are graphical representations of the relationship between dataset size and error rate in machine learning. Curve fitting is the process of estimating a learning curve using a mathematical formula. This paper analyzes two ways of performing curve fitting: interpolation and extrapolation. The accuracy of the curve-fitting procedure might be negatively influenced by the irregular shape of the curve and the presence of noise. Our study investigates the effects of the Gaussian filter on curve fitting and the potential to improve its performance. This is done by analyzing multiple values of the Gaussian filter's standard deviation parameter(Sigma) and also a wide variety of learning curves(both smooth and noisy ones). The main finding of this research states that the Gaussian filter can generate significant improvements in the extrapolation process, especially when it is applied to noisy curves. On the other hand, for the interpolation procedure, its impact is reduced, even negligible for smooth curves. An important takeaway from this paper is that selecting the most suitable pre-processing method for the type of curve analyzed might generate valuable findings in the field of learning curves used in machine learning.
Learning curves show the learning rate of a clas- sifier by plotting the dataset size used to train the classifier versus the error rate. By extrapolating these curves it is possible to predict how well the classifier will perform when trained on dataset sizes that are currently not available. This can be useful when trying to determine which classifier to select when dealing with a classification problem. Ob- taining these learning curves is usually done by fit- ting a parametric model to the learning data. This paper analyzes the potential of fitting the curve in a different space scaling the fitting data. This is done by analyzing the accuracy of the fit and the frequency of the fit succeeding. Our main findings are that log scaling produces better MSEs than not scaling, while exponential scaling is inconclusive.
...
Learning curves show the learning rate of a clas- sifier by plotting the dataset size used to train the classifier versus the error rate. By extrapolating these curves it is possible to predict how well the classifier will perform when trained on dataset sizes that are currently not available. This can be useful when trying to determine which classifier to select when dealing with a classification problem. Ob- taining these learning curves is usually done by fit- ting a parametric model to the learning data. This paper analyzes the potential of fitting the curve in a different space scaling the fitting data. This is done by analyzing the accuracy of the fit and the frequency of the fit succeeding. Our main findings are that log scaling produces better MSEs than not scaling, while exponential scaling is inconclusive.