Evaluating Data Distribution Based Concept Drift Detectors

Bachelor Thesis (2023)
Author(s)

K.O. Kanniainen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

L. Poenaru-Olaru – Mentor (TU Delft - Software Engineering)

Jan Rellermeyer – Mentor (TU Delft - Data-Intensive Systems)

JH Krijthe – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Konsta Kanniainen
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Konsta Kanniainen
Graduation Date
03-02-2023
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Various techniques have been studied to handle unexpected changes in data streams, a phenomenon called concept drift. When the incoming data is not labeled and the labels are also not obtainable with a reasonable effort, detecting these drifts becomes less trivial. This study evaluates how well two data distribution based label-independent drift detection methods, SyncStream and Statistical Change Detection for Multi-Dimensional Data, detect concept drift. This is done by implementing the algorithms and evaluating them side by side on both synthetic and real-world datasets. The metrics used for synthetic datasets are False Positive Rate and Latency; for real-world datasets, Accuracy is used instead of Latency. The experiments show that both drift detectors perform significantly worse on real-world than on synthetic data.

Files

License info not available