Locally Explainable Isolation Forest with Mixed-Attribute Data and Ternary Isolation Trees

Combatting Money Laundering with Anomaly Detection

Master Thesis (2021)
Author(s)

M.E. Huistra (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Kees Oosterlee – Mentor (TU Delft - Numerical Analysis)

N. Parolya – Graduation committee member (TU Delft - Statistics)

N.V. Budko – Graduation committee member (TU Delft - Numerical Analysis)

Evert Haasdijk – Graduation committee member (Deloitte)

L.A. Souto Arias – Graduation committee member (TU Delft - Numerical Analysis)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2021 Mark Huistra
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Mark Huistra
Graduation Date
01-10-2021
Awarding Institution
Delft University of Technology
Programme
Applied Mathematics
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In the fight against money laundering, demand for data-driven Anti-Money Laundering (AML) solutions is growing. Particularly anomaly detection algorithms have proven effective in the detection of suspicious customer behaviour, as well as observing patterns otherwise hidden in customer transaction data. In this thesis, the Isolation Forest anomaly detection algorithm is studied in combination with the model-specific local explanation method, Multiple Indicator Local Depth-based Isolation Forest Feature Importance (MI-Local-DIFFI). To expand Isolation Forest to mixed-attribute data sets, the incorporation of nominal features is explored in more detail. This analysis resulted in the introduction of Isolation Forest with Categorical Sampling (iForestCS ), a methodology that directly incorporates nominal attributes into an isolation tree without the need of encoding it onto a numerical scale. This method is tested against different encoding strategies and Isolation Forest Conditional Anomaly Detection (iForestCAD) using different synthetic data sets. The method shows improved performance to the utilization of encoding strategies for different parameters of the underlying synthetic data. Furthermore, this thesis explores the potential of ternary Isolation Forest, in which the branching strategy of an isolation tree is expanded to produce three child nodes. It is demonstrated using synthetic data, that particularly the performance of MI-Local-DIFFI reduces when applied to a ternary Isolation Forest. Finally, the research considers a practical use-case. Using customer transaction data from Triodos Bank, the locally explainable Isolation Forest is applied to mixed-attribute customer transaction data. This has provided useful insight and resulted in the detection of suspicious customer behaviour and the introduction of new rules into business practices. Although the most interesting customer behaviour did not directly emanate from the nominal attributes, the method of incorporating nominal features resulted in differences when considering the anomalies with the highest anomaly scores.

Files

License info not available