Print Email Facebook Twitter Also for k-means Title Also for k-means: more data does not imply better performance Author Loog, M. (TU Delft Pattern Recognition and Bioinformatics; Radboud Universiteit Nijmegen) Krijthe, J.H. (TU Delft Pattern Recognition and Bioinformatics; Radboud Universiteit Nijmegen) Bicego, Manuele (University of Verona) Date 2023 Abstract Arguably, a desirable feature of a learner is that its performance gets better with an increasing amount of training data, at least in expectation. This issue has received renewed attention in recent years and some curious and surprising findings have been reported on. In essence, these results show that more data does actually not necessarily lead to improved performance—worse even, performance can deteriorate. Clustering, however, has not been subjected to such kind of study up to now. This paper shows that k-means clustering, a ubiquitous technique in machine learning and data mining, suffers from the same lack of so-called monotonicity and can display deterioration in expected performance with increasing training set sizes. Our main, theoretical contributions prove that 1-means clustering is monotonic, while 2-means is not even weakly monotonic, i.e., the occurrence of nonmonotonic behavior persists indefinitely, beyond any training sample size. For larger k, the question remains open. Subject k-Means algorithmk-Means clusteringLearning curveMonotonicityPerformance improvementSmartness To reference this document use: http://resolver.tudelft.nl/uuid:caf7a443-1007-4fb4-8d47-aa1460e7f7b3 DOI https://doi.org/10.1007/s10994-023-06361-6 ISSN 0885-6125 Source Machine Learning, 112 (8), 3033-3050 Part of collection Institutional Repository Document type journal article Rights © 2023 M. Loog, J.H. Krijthe, Manuele Bicego Files PDF s10994_023_06361_6.pdf 1.28 MB Close viewer /islandora/object/uuid:caf7a443-1007-4fb4-8d47-aa1460e7f7b3/datastream/OBJ/view