Detecting outliers from pairwise proximities

Proximity isolation forests

Journal Article (2023)
Author(s)

Antonella Mensi (University of Verona)

DMJ Tax (TU Delft - Pattern Recognition and Bioinformatics)

Manuele Bicego (University of Verona)

Research Group
Pattern Recognition and Bioinformatics
Copyright
© 2023 Antonella Mensi, D.M.J. Tax, Manuele Bicego
DOI related publication
https://doi.org/10.1016/j.patcog.2023.109334
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Antonella Mensi, D.M.J. Tax, Manuele Bicego
Research Group
Pattern Recognition and Bioinformatics
Volume number
138
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Because outliers are very different from the rest of the data, it is natural to represent outliers by their distances to other objects. Furthermore, there are many scenarios in which only pairwise distances are known, and feature-based outlier detection methods cannot directly be applied. Considering these observations, and given the success of Isolation Forests for (feature-based) outlier detection, we propose Proximity Isolation Forest, a proximity-based extension. The methodology only requires a set of pairwise distances to work, making it suitable for different types of data. Analogously to Isolation Forest, outliers are detected via their early isolation in the trees; to encode the isolation we design nine training strategies, both random and optimized. We thoroughly evaluate the proposed approach on fifteen datasets, successfully assessing its robustness and suitability for the task; additionally we compare favourably to alternative proximity-based methods.

Files

1_s2.0_S0031320323000353_main.... (pdf)
(pdf | 1.12 Mb)
- Embargo expired in 25-07-2023
License info not available