Evaluating Data Distribution Based Concept Drift Detectors

Bachelor thesis (2023)

Authors

K.O. Kanniainen Electrical Engineering, Mathematics and Computer Science

Contributors

L. Poenaru-Olaru Software Engineering - (mentor)

Jan S. Rellermeyer Data-Intensive Systems - (mentor)

J.H. Krijthe Pattern Recognition and Bioinformatics - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

To reference this document use:

http://resolver.tudelft.nl/uuid:86e9c0ff-13eb-4e4d-8eb1-5045aacf666a

More Info

expand_more

Published Date

03-02-2023

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Various techniques have been studied to handle unexpected changes in data streams, a phenomenon called concept drift. When the incoming data is not labeled and the labels are also not obtainable with a reasonable effort, detecting these drifts becomes less trivial. This study evaluates how well two data distribution based label-independent drift detection methods, SyncStream and Statistical Change Detection for Multi-Dimensional Data, detect concept drift. This is done by implementing the algorithms and evaluating them side by side on both synthetic and real-world datasets. The metrics used for synthetic datasets are False Positive Rate and Latency; for real-world datasets, Accuracy is used instead of Latency. The experiments show that both drift detectors perform significantly worse on real-world than on synthetic data.

Files

Evaluating_Data_Distribution_B... (.pdf)

(.pdf | 0.342 Mb)