Evaluating Data Distribution Based Concept Drift Detectors

None, None

Evaluating Data Distribution Based Concept Drift Detectors

Bachelor Thesis (2023)

Author(s)

K.O. Kanniainen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

L. Poenaru-Olaru – Mentor (TU Delft - Software Engineering)

Jan Rellermeyer – Mentor (TU Delft - Data-Intensive Systems)

JH Krijthe – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Machine learning Concept drift Data Stream Synchronization

To reference this document use:

https://resolver.tudelft.nl/uuid:86e9c0ff-13eb-4e4d-8eb1-5045aacf666a

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

03-02-2023

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Various techniques have been studied to handle unexpected changes in data streams, a phenomenon called concept drift. When the incoming data is not labeled and the labels are also not obtainable with a reasonable effort, detecting these drifts becomes less trivial. This study evaluates how well two data distribution based label-independent drift detection methods, SyncStream and Statistical Change Detection for Multi-Dimensional Data, detect concept drift. This is done by implementing the algorithms and evaluating them side by side on both synthetic and real-world datasets. The metrics used for synthetic datasets are False Positive Rate and Latency; for real-world datasets, Accuracy is used instead of Latency. The experiments show that both drift detectors perform significantly worse on real-world than on synthetic data.

Files

Evaluating_Data_Distribution_B... (pdf)

(pdf | 0.342 Mb)

License info not available