Detecting Concept Drift in Deployed Machine Learning Models

How well do Margin Density-based concept drift detectors identify concept drift in case of synthetic/real-world data?

Bachelor Thesis (2023)
Author(s)

B.G.L. André (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Jan S. Rellermeyer – Mentor (TU Delft - Data-Intensive Systems)

L. Poenaru-Olaru – Mentor (TU Delft - Software Engineering)

JH Krijthe – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Baptiste André
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Baptiste André
Graduation Date
03-02-2023
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

When deployed in production, machine learning models sometimes lose accuracy over time due to a change in the distribution of the incoming data, which results in the model not reflecting reality any longer. A concept drift is this loss of accuracy over time. Drift detectors are algorithms used to detect such drifts. Drift detectors are important as they allow us to detect when a classification model becomes inaccurate. Some possible uses of drift detectors can even go as far as detecting adversarial attacks on machine learning algorithms. The detectors discussed in this paper are Margin Density drift detectors. Their evaluation is made within an unsupervised context, where we assume no testing labels are available. In real world applications of machine learning models, this might often be the case, as finding labels is costly. Experiments in this paper have found that margin density detectors can be useful tools in detecting the first drift for synthetic data, even though parameter tuning must be done to achieve high accuracy for some datasets. In an unsupervised environment with more than one drift, the drift detectors are unreliable as was seen in experiments involving real world data. With this paper comes an implementation of margin density detectors.

Files

License info not available