Deciphering Learning Curve Characteristics via K-Means Clustering of Curve Model Parameters

More Info
expand_more

Abstract

Learning curves illustrate the relationship between the performance of learning algorithms and the increasing volume of training data [1, 2, 3]. While the concept of learning curves is well-established, clustering these curves based on fitting parameters remains an underexplored area. Our study delves into this domain and leverages the Learning Curve Database (LCDB) to discover potential patterns. We investigate whether different curve models uncover distinct patterns, examine the impact of different datasets on these learners, and explore if various learners display unique characteristics and behaviors or adhere to a common pattern. Curve model analyses conclude that most of the data points are in a single cluster (dominant cluster), indicating a potential commonality. Certain learners, such as QuadraticDiscriminantAnalysis and PassiveAggressiveClassifier, exhibit unique traits and do not conform to this common pattern, regardless of dataset attributes. Moreover, while various learners demonstrate similar characteristics within a single curve model, distinct patterns emerged when comparing across different curve models, indicating internal similarity but external divergence in behavior.