Local Explanation Methods for Isolation Forest

Explainable Outlier Detection in Anti-Money Laundering

Master Thesis (2020)
Author(s)

K.B. Bergþórsdóttir (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Cornelis W. Oosterlee – Mentor (TU Delft - Numerical Analysis)

Evert Haasdijk – Mentor

Andrea Fontanari – Graduation committee member (Centrum Wiskunde & Informatica (CWI))

N. Parolya – Graduation committee member (TU Delft - Statistics)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2020 Kristin Bergþórsdóttir
More Info
expand_more
Publication Year
2020
Language
English
Copyright
© 2020 Kristin Bergþórsdóttir
Graduation Date
28-08-2020
Awarding Institution
Delft University of Technology
Programme
['Applied Mathematics | Financial Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Machine learning methods like outlier detection are becoming increasingly more popular as tools in the fight against money laundering. In this thesis, we analyse the Isolation Forest outlier detection algorithm in detail and introduce a new local explanation method for Isolation Forest, the MI-Local-DIFFI (Multiple Indicator Local-DIFFI) method. The method uses the structure of the isolation trees and the traversal of individual outliers down the trees to determine an importance weight for each of the features. These weights are then combined into feature importance scores that are used to explain why a specific outlier is identified as such. In anti-money laundering (AML), such explanations are very valuable when determining whether an outlying customer is suspicious or not. MI-Local-DIFFI is based on a global explanation method called DIFFI and while we were conducting our research, another local version, Local-DIFFI, was also introduced. In the thesis, we use a synthetic data set to compare the performance of four different explanation methods including our MI-Local-DIFFI, Local-DIFFI and the state-of-the-art TreeSHAP method. Our MI-Local-DIFFI shows excellent results in terms of performance and runtime. Furthermore, we use a data set from Triodos bank to apply the explainable outlier detection methodology consisting of the combination of Isolation Forest and MI-Local-DIFFI. This resulted in interesting findings like the revealing of data quality issues in the current system and references to EDRs and SARs. However, after further inspection, no SARs were filed but some customers were put to higher risk classes. This procedure will be performed on a monthly basis with the goal of continuing to improve the AML processes of the bank.

Files

Thesis_final_Kristin.pdf
(pdf | 6.42 Mb)
License info not available