Relevance Detection of Unknown Classes through Cluster Distances

Based on Statistical Distance Measures in Feature Space

More Info
expand_more

Abstract

In the open world, machine learning (ML) models can encounter a multitude of unknown or novel classes. In a surveillance, safety, or security use case, unknown samples can pose potential threats that are hard to detect since those samples have never been trained on. At the same time, most of the unknowns that will be encountered by a surveillance ML model will be harmless. This results in too many unwanted alerts and manual analyses, of harmless unknowns that have been flagged.

Through this thesis, for the first time (to the best of our knowledge), a method is developed that can automatically assess the relevance of unknown classes, by modelling their image features as clusters (or distributions) and comparing them using statistical distance measures. Our use case lies in computer vision for military applications, where based on the user input, relevance is defined. We define road vehicles as relevant classes and use those for our training set. Our aim is to build a model that can successfully classify new unseen road vehicles as ‘relevant unknowns’, while also successfully classifying harmless unknown birds that are not part of the training set, as ‘irrelevant unknowns’. On the DomainNet data-set, we demonstrate that our novel method can very accurately determine the relevance of unknown classes at test time for both low and high-dimensional data, with AUC scores ranging from 0.99 to a perfect 1.00.