How Well do Clustering Similarities-Based Concept Drift Detectors Identify Drift in case of Synthetic/Real-World Data?
J. Pohl (TU Delft - Electrical Engineering, Mathematics and Computer Science)
L. Poenaru-Olaru – Mentor (TU Delft - Software Engineering)
Jan S. Rellermeyer – Mentor (TU Delft - Data-Intensive Systems)
Jesse Krijthe – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Concept drift is an unforeseeable change in the underlying data distribution of streaming data, and because of such a change, deployed classifiers over that data show a drop in accuracy. Concept drift detectors are algorithms capable of detecting such a drift, and unsupervised ones detect drift without needing the data’s actual labels, which can be expensive to obtain. This work is concerned with the implementation and evaluation of two existing unsupervised concept drift detectors based on clustering, UCDD and MSSW, by evaluation on both synthetic and real-world data. Our biggest contribution is in making implementations publicly available. By evaluation, we also realise that UCDD detects drift earlier for simple numerical synthetic datasets, MSSW detects drift earlier for more complex synthetic datasets with categorical features, and none seems suitable for real-world datasets.