How Well do Clustering Similarities-Based Concept Drift Detectors Identify Drift in case of Synthetic/Real-World Data?

Bachelor Thesis (2023)
Author(s)

J. Pohl (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

L. Poenaru-Olaru – Mentor (TU Delft - Software Engineering)

Jan S. Rellermeyer – Mentor (TU Delft - Data-Intensive Systems)

Jesse Krijthe – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Jindřich Pohl
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Jindřich Pohl
Graduation Date
03-02-2023
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Concept drift is an unforeseeable change in the underlying data distribution of streaming data, and because of such a change, deployed classifiers over that data show a drop in accuracy. Concept drift detectors are algorithms capable of detecting such a drift, and unsupervised ones detect drift without needing the data’s actual labels, which can be expensive to obtain. This work is concerned with the implementation and evaluation of two existing unsupervised concept drift detectors based on clustering, UCDD and MSSW, by evaluation on both synthetic and real-world data. Our biggest contribution is in making implementations publicly available. By evaluation, we also realise that UCDD detects drift earlier for simple numerical synthetic datasets, MSSW detects drift earlier for more complex synthetic datasets with categorical features, and none seems suitable for real-world datasets.

Files

CSE3000_Paper.pdf
(pdf | 0.43 Mb)
License info not available