Machine learning methods are explored in an attempt to achieve better predictive performance than the legacy rule-based fraud detection systems that are currently used to detect fraudulent car insurance claims. There are two key principles that lead the exploration of machine lea
...
Machine learning methods are explored in an attempt to achieve better predictive performance than the legacy rule-based fraud detection systems that are currently used to detect fraudulent car insurance claims. There are two key principles that lead the exploration of machine learning techniques and algorithms in this thesis, namely, the applicability to imbalanced data, and the interpretability of predictions. The dataset used for model training and evaluation contains only 0.3\% fraudulent claims compared to 99.7\% non-fraudulent claims, which can therefore be considered highly imbalanced. Furthermore, prediction interpretability is of great importance, since fraud experts are directly interfacing with the output of the machine learning models. With the key principles in mind, this thesis considers four algorithms, Logistic Regression, Random Forest, LightGBM and a Stacking classifier. The algorithms are trained on the imbalanced learning problem by using a combination of undersampling (random and Edited Nearest Neighbors), oversampling (SMOTE) and class weighting. Conclusively, each trained model meets the objective, with the Stacking classifier combining the best performance with the lowest variance. By benchmarking the baseline for two different parameters, the models can be evaluated for two boundary conditions, which leads to tunable performance between the two conditions. Ultimately, the performance of the Stacking classifier is tunable (by moving its classification threshold) to roughly a 70-80\% increase in extra fraud caught or a 75\% reduction in effort. Extra fraud will increase the amount of real fraudulent claims that fraud experts get to see, and effort reduction leads to an increase in capacity, which enables fraud experts to spend more time on other more relevant tasks.