Global Interpretation of Image Classification Models via SEmantic Feature Analysis (SEFA)

More Info
expand_more

Abstract

Deep learning models have achieved state-of-the-art performance on several image classification tasks over the past years. Several studies claim to approach or even surpass human-levels of performance when using such models to classify images. However, these architectures are notoriously complex, thus making their interpretation a challenge. This limited interpretability, in turn, leads to several issues, such as restricting their applicability to critical domains like health care and finance.

Several methods in literature attempt to address this issue by providing local explanations which describe individual predictions or global ones that explain the model behaviour for a specific class. When focusing on global methods, we notice that they are limited with respect to the interpretability queries that they answer. For instance, consider we want to query whether the simultaneous presence of two objects is associated with predicting a specific class. To the best of our knowledge, there is no existing method that can tackle such a query type due to their limited expressivity. In this thesis, we address this limitation by answering the following research question: to what extent can image classification models be interpreted by analysing semantic features extracted from groups of salient image pixels?

We begin our study by investigating existing research work to devise the ideal characteristics that an interpretability method should adhere to. Our analysis highlights the aforementioned gap regarding the query complexity that existing methods cover. To address this limitation, we propose a new global interpretability method called SEmantic Feature Analysis (SEFA). To elaborate, it combines explanations of individual image predictions with semantic descriptions provided by human annotators about them, thus extracting the aforementioned semantic features. We argue that by analysing a structured data representation extracted out of semantic features will allow us to answer a wider range of interpretability queries compared to existing methods. The proposed method poses several challenges, such as identifying the number of image annotations required to obtain reliable results at a reasonable annotation cost.

Our results show that SEFA provides its users with the flexibility to answer several types of interpretability queries, including the ones that we found existing methods to be lacking. Further experimentation on its hyperparameters using three separate image classification tasks provides us with a set of suggested settings that one should use on similar datasets. Finally, we showcase the ability of SEFA to output semantic features relevant to the model classification behaviour by fine-tuning existing model architectures on biased datasets and evaluating whether the salient semantic features output describe the previous bias.