Automated Sample Ratio Mismatch (SRM) Detection and Analysis
Lukas Vermeer (Vista )
Kevin Anderson (Vista , TU Delft - Software Engineering)
Mauricio Acebal (Vista )
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Background: Sample Ratio Mismatch (SRM) checks can help detect data quality issues in online experimentation [3]. Not all experimentation platforms provide these checks as part of their solution. Users of these platforms must therefore manually check for SRM, or rely on additional processes—such as checklists [2]—or automation. Objective: To ensure reliable and early detection of SRM, we wanted to automate the detection and analysis of SRM in experiments running on third-party experimentation platforms. Method: A set of Looker dashboards were built to facilitate self-serve SRM detection and root cause analysis. In addition, we added email and chat based alerting to pro-actively inform experimenters of SRM and guide them towards these dashboards when needed. Results: Several cases of SRM have been detected and experimenters have been warned. Bad decisions based on flawed data were avoided. We provide one such example as an illustration. Conclusions: SRM checks are relatively straightforward to automate and can be useful for data quality monitoring even for companies who rely on third-party experimentation platforms. Pro-active alerting—rather than passive reporting—can reduce time to detection and help non-experts avoid making decisions based on biased data.