A Human-In-the-Loop Framework to Assess Multimodal Machine Learning Models

More Info
expand_more

Abstract

Recent works explain the DNN models that perform image classification tasks following the "attribution, human-in-the-loop, extraction" workflow. However, little work has looked into such an approach for explaining DNN models for language or multimodal tasks. To address this gap, we propose a framework that explains and assesses the model utilizing both the categorical/numerical features and the text while optimizing the "attribution, human-in-the-loop, extraction" workflow. In particular, our framework deals with limited human resources, especially when domain experts are required for human-in-the-loop tasks. It provides insight regarding which set of data should the human-in-the-loop tasks be brought in. We share the results of applying this framework to a multimodal transformer that performs text classification tasks for compliance detection in the financial context.

Files