How Well do Clustering Similarities-Based Concept Drift Detectors Identify Drift in case of Synthetic/Real-World Data?

More Info
expand_more

Abstract

Concept drift is an unforeseeable change in the underlying data distribution of streaming data, and because of such a change, deployed classifiers over that data show a drop in accuracy. Concept drift detectors are algorithms capable of detecting such a drift, and unsupervised ones detect drift without needing the data’s actual labels, which can be expensive to obtain. This work is concerned with the implementation and evaluation of two existing unsupervised concept drift detectors based on clustering, UCDD and MSSW, by evaluation on both synthetic and real-world data. Our biggest contribution is in making implementations publicly available. By evaluation, we also realise that UCDD detects drift earlier for simple numerical synthetic datasets, MSSW detects drift earlier for more complex synthetic datasets with categorical features, and none seems suitable for real-world datasets.

Files