Clustering Learning Curves in Machine Learning using K-Means Algorithm

Can patterns be identified amongst learning curves after the application of the K-Means algorithm using point and statistical vectors?

Bachelor Thesis (2024)
Author(s)

P.S.P. Ramsundersingh (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

T.J. Viering – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

O.T. Turan – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2024 Pravesha Ramsundersingh
More Info
expand_more
Publication Year
2024
Language
English
Copyright
© 2024 Pravesha Ramsundersingh
Graduation Date
01-02-2024
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

A learning curve can serve as an indicator of the “performance of trained models versus the training set size” [1]. Recent research on learning curve analysis has been carried out within the Learning Curve Database (LCDB) [2] This paper will investigate if there are similarities amongst these curves by clustering those provided by the LCDB. The experiment employs two distinct input parameters: point vectors and statistical vectors. By conducting individual learner analysis, individual dataset analysis, principal component analysis, and other experiments, patterns are isolated for both input sets. Upon further exploration of shapes and distributions, the concluding remark is that the point vector clustering produced one key concrete pattern amongst certain learning techniques. In contrast, the statistical vector findings are more inconclusive and do not exhibit a clear distinction that could be linked to any dominant patterns.

Files

License info not available