Do Joint Energy-Based Models Produce More Plausible Counterfactual Explanations?
G. Pezzali (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Counterfactual explanations (CEs) can be used to gain useful insights into the behaviour of opaque classification models, allowing users to make an informed decision when trusting such systems. Assuming the CEs of a model are faithful (they well represent the inner workings of the model), an explainable model generates plausible CEs (i.e. CEs fitting the real-world distribution of the data). This raises the question of whether classifiers explicitly designed to model the distribution of the data, such as energy-based models, are inherently more explainable. This work focuses on the evaluation of joint energy-based models (JEMs) in combination with the Energy-Constrained Conformal Counterfactuals (ECCCo) generator, with the goal of identifying if the generative capability of a model influences its explainability. Since ECCCo has been designed specifically to generate more faithful CEs, it makes it possible to use the CEs plausibility as a proxy of the model explainability. Two experiments have been performed to evaluate the effect of variations of generative capability within the same JEM architecture and the difference between JEMs and classically trained classifiers. Despite the experiments not having established a clear correlation between generative capability and explainability of a model, various research avenues are still open to explore in future works