This thesis investigates the effectiveness and efficiency of embedding-based drift detection in machine learning systems, focusing on synthetic simulations and real-world production data. Through controlled experiments, we compare vector-based and distribution-based metrics regar
...
This thesis investigates the effectiveness and efficiency of embedding-based drift detection in machine learning systems, focusing on synthetic simulations and real-world production data. Through controlled experiments, we compare vector-based and distribution-based metrics regarding sensitivity to drift, memory and runtime cost, and practical utility for early warning of performance degradation. Results from synthetic drift experiments indicate that vector- based metrics respond quickly to small shifts but tend to saturate early, limiting their ability to differentiate between moderate and severe drift. Distribution-based metrics, by contrast, scale more proportionally across the entire drift spectrum, providing more stable and interpretable signals. Memory and runtime profiling show that vector-based methods are consistently more efficient, while distribution-based approaches incur higher costs. A real-world evaluation using eight years of data from a deployed recommendation system confirms the practical value of these findings. Vector metrics consistently provided earlier signals, on average 87 days before performance drops, compared to distribution metrics, which often lagged. However, distribution metrics offered smoother trends and fewer false positives, making them better suited for long-term monitoring. This thesis also explores trade-offs introduced by embedding compression. Principal Component Analysis (PCA) and KLL Sketches were evaluated for reducing computational overhead. PCA preserves the drift signal better in vector metrics, but is more resource-intensive. In contrast, KLL is highly efficient, but sacrifices sensitivity, particularly in vector space.