Local Explanation Methods for Isolation Forest

Bergþórsdóttir, K.B.

Local Explanation Methods for Isolation Forest

Explainable Outlier Detection in Anti-Money Laundering

Master thesis (2020)

Authors

K.B. Bergþórsdóttir Electrical Engineering, Mathematics and Computer Science

Contributors

C. W. Oosterlee Numerical Analysis (mentor)

Evert Haasdijk (mentor)

Andrea Fontanari Centrum Wiskunde & Informatica (CWI) (graduation committee member)

N. Parolya Statistics (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Machine Learning Explainable Machine Learning Outlier Detection Anti-Money Laundering

To reference this document use:

http://resolver.tudelft.nl/uuid:da4ed7f1-62d5-4871-8475-5b5f68183ab0

More Info

expand_more

Published Date

28-08-2020

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Machine learning methods like outlier detection are becoming increasingly more popular as tools in the fight against money laundering. In this thesis, we analyse the Isolation Forest outlier detection algorithm in detail and introduce a new local explanation method for Isolation Forest, the MI-Local-DIFFI (Multiple Indicator Local-DIFFI) method. The method uses the structure of the isolation trees and the traversal of individual outliers down the trees to determine an importance weight for each of the features. These weights are then combined into feature importance scores that are used to explain why a specific outlier is identified as such. In anti-money laundering (AML), such explanations are very valuable when determining whether an outlying customer is suspicious or not. MI-Local-DIFFI is based on a global explanation method called DIFFI and while we were conducting our research, another local version, Local-DIFFI, was also introduced. In the thesis, we use a synthetic data set to compare the performance of four different explanation methods including our MI-Local-DIFFI, Local-DIFFI and the state-of-the-art TreeSHAP method. Our MI-Local-DIFFI shows excellent results in terms of performance and runtime. Furthermore, we use a data set from Triodos bank to apply the explainable outlier detection methodology consisting of the combination of Isolation Forest and MI-Local-DIFFI. This resulted in interesting findings like the revealing of data quality issues in the current system and references to EDRs and SARs. However, after further inspection, no SARs were filed but some customers were put to higher risk classes. This procedure will be performed on a monthly basis with the goal of continuing to improve the AML processes of the bank.

Files

Thesis_final_Kristin.pdf

(pdf | 6.42 Mb)

License info not available