P. Altmeyer | TU Delft Repository

Natural Language Counterfactual Explanations in Financial Text Classification

Master thesis (2024) - K.T. Dobiczek, C.C.S. Liem, P. Altmeyer, J. Yang

Central banks communicate their monetary policy plans to the public through meeting minutes or transcripts. These communications can have immense effects on markets and are often the subjects of studies in the financial literature. The recent advancements in Natural Language Processing have prompted researchers to analyze these communications using Transformer-based Large Language Model (LLM) classifiers. The use of LLMs in finance and other high-stakes domains calls for a high level of trustworthiness and explainability of those models. We focus on Counterfactual Explanations, a form of Explainable AI that explains a model's classification by proposing an alternative to the original input. We use three types of CE generators for LLM classifiers on a recent dataset consisting of sentences taken from FOMC communications to assess the usability of their explanations. We perform three experiments comparing different types of generators, one using a selection of quantitative metrics and two involving human evaluators, including central bank employees. Our findings suggest that non-expert and expert evaluators prefer counterfactual methods that apply minimal changes to the texts; however, the methods we analyze might not handle the domain-specific vocabulary well enough to generate plausible explanations for our task. We discuss shortcomings in the choice of evaluation metrics in the literature on text CE generators and propose refined definitions of the fluency and plausibility qualitative metrics. ...

Advancing Explainability in Black-Box Models

Bachelor thesis (2024) - İpek İşcan, P. Altmeyer, C.C.S. Liem, B.J.W. Dudzik

In recent years, the need for explainable artificial intelligence (XAI) has become increasingly important as complex black-box models are used in critical applications. While many methods have been developed to interpret these models, there is also potential in enhancing the models themselves to improve their inherent explainability. This paper investigates various techniques aimed at improving the explainability of black-box models. Through a systematic literature review, these techniques are categorized, and their impact on predictive uncertainty, adversarial robustness, and generative capacity is analyzed to understand how these factors contribute to the overall explainability. The snowballing methodology is used for the systematic literature review, starting with papers retrieved from four databases: IEEExplore, Scopus, ArXiv, and the ACM Digital Library to form the initial set. This process continued with backward and forward snowballing through four iterations, resulting in a total of 50 papers reviewed. Only papers focused on improving model explainability are included in the review. Due to time limitations, additional search constraints are applied for feasibility. The initial set of papers is filtered to those published since 2013. These constraints and their possible impacts are considered when interpreting the results. Findings reveal that techniques such as Bayesian approaches and variational inference, adversarial robustness, model compression and distillation, uncertainty and ensembles, regularization, self-explaining models, and hybrid techniques are used for advancing model explainability. The paper concludes with a discussion on the implications of these techniques for future research. ...

Metrics to Ascertain the Plausibility and Faithfulness of Counterfactual Explanations

Bachelor thesis (2024) - A.F. Yücel, P. Altmeyer, C.C.S. Liem, B.J.W. Dudzik

Counterfactual Explanations (CE) are essential for understanding the predictions of black-box models by suggesting minimal changes to input features that would alter the output. Despite their importance in Explainable AI (XAI), there is a lack of standardized metrics to assess the plausibility and faithfulness of these explanations. This paper reviews evaluation procedures in literature and proposes novel formal metrics for evaluating the plausibility and faithfulness of counterfactual explanations, addressing the existing limitations. Plausibility is defined as the coherence of explanations with the true data-generating process, while faithfulness refers to the accuracy of explanations in representing the model's reasoning. We discuss the shortcomings of existing evaluation procedures and metrics for measuring plausibility and faithfulness and consequently compare our proposed metrics with existing ones, highlighting their advantages and disadvantages. The proposed metrics are then empirically validated through experiments across multiple models and datasets, demonstrating their model-agnostic nature and reliability. Our findings indicate that the proposed metrics provide a correct and reliable means to quantify the plausibility and faithfulness of counterfactual explanations, thereby allowing one to gauge their feasibility and trustworthiness consistently. ...

How Does Predictive Uncertainty Quantification Correlate with the Plausibility of Counterfactual Explanations

Bachelor thesis (2024) - D. Nikolov, P. Altmeyer, C.C.S. Liem, B.J.W. Dudzik

Counterfactual explanations can be applied to algorithmic recourse, which is concerned with helping individuals in the real world overturn undesirable algorithmic decisions. They aim to provide explanations to opaque machine learning models. Not all generated points are equally faithful to the model, nor equally plausible. On the other hand, predictive uncertainty quantification is used to measure the degree of certainty a model has in its predictions. Previously, it has been shown that it is possible to generate more plausible counterfactual explanations utilising predictive uncertainty. This work investigates this further by using multiple models innately supporting uncertainty quantification and comparing the produced counterfactual explanations to those produced by the models' ordinary counter-part. Predictive uncertainty tends to enhance the plausibility of the counterfactuals on visual datasets. Furthermore, we are positive that predictive uncertainty correlates proportionally with plausibility. This correlation has important implications for both research and real-world applications, as it suggests that integrating uncertainty quantification in model development can improve the quality and trustworthiness of algorithmic explanations. ...

Are Neural Networks Robust to Gradient-Based Adversaries Also More Explainable? Evidence from Counterfactuals

Bachelor thesis (2024) - R. Appachi Senthilkumar, P. Altmeyer, C.C.S. Liem, B.J.W. Dudzik

Adversarial Training has emerged as the most reliable technique to make neural networks robust to gradient-based adversarial perturbations on input data. Besides improving model robustness, preliminary evidence presents an interesting consequence of adversarial training -- increased explainability of model behaviour. Prior work has explored the effects of adversarial training on gradient stability and interpretability, as well as visual explainability of counterfactuals. Our work presents the first quantitative, empirical analysis of the impact of model robustness on model explainability by comparing the plausibility of faithful counterfactuals for both robust and standard networks. We seek to determine whether robust networks learn more plausible decision boundaries and representations of the data than regular models, and whether the strength of the adversary used to train robust models affects their explainability. Our finidngs indicate that robust networks for image data learn more explainable decision boundaries and representations of data than regular models, with more robust models producing more plausible counterfactuals. Robust models for tabular data, however, only conclusively exhibit this phenomenon along decision boundaries and not for its overall data representations, possibly due to its high robustness-accuracy trade-off and the difficulties associated with traditional adversarial training due to its innate properties. We believe our work can help guide future research towards improving the robustness of machine learning models keeping their explainability in mind.
...

Do Joint Energy-Based Models Produce More Plausible Counterfactual Explanations?

Bachelor thesis (2024) - G. Pezzali, P. Altmeyer, C.C.S. Liem, B.J.W. Dudzik

Counterfactual explanations (CEs) can be used to gain useful insights into the behaviour of opaque classification models, allowing users to make an informed decision when trusting such systems. Assuming the CEs of a model are faithful (they well represent the inner workings of the model), an explainable model generates plausible CEs (i.e. CEs fitting the real-world distribution of the data). This raises the question of whether classifiers explicitly designed to model the distribution of the data, such as energy-based models, are inherently more explainable. This work focuses on the evaluation of joint energy-based models (JEMs) in combination with the Energy-Constrained Conformal Counterfactuals (ECCCo) generator, with the goal of identifying if the generative capability of a model influences its explainability. Since ECCCo has been designed specifically to generate more faithful CEs, it makes it possible to use the CEs plausibility as a proxy of the model explainability. Two experiments have been performed to evaluate the effect of variations of generative capability within the same JEM architecture and the difference between JEMs and classically trained classifiers. Despite the experiments not having established a clear correlation between generative capability and explainability of a model, various research avenues are still open to explore in future works ...

A Study on Counterfactual Explanations

Investigating the impact of inter-class distance and data imbalance

Master thesis (2024) - I. Zagorac, C.C.S. Liem, P. Altmeyer, D.M.J. Tax

Counterfactual explanations (CEs) are emerging as a crucial tool in Explainable AI (XAI) for understanding model decisions. This research investigates the impact of various factors on the quality of CEs generated for classification tasks. We explore how inter-class distance, data imbalance, balancing techniques, the presence of biased classifiers, and decision thresholds influence CE quality. To answer these research questions, we conduct experiments on various datasets, classification models and counterfactual generators. The datasets include the MNIST and GMSC dataset. The models include well-established models like MLP and Random Forest, along with the novel NeuroTree model. The generators include the method proposed by Wachter et al. and the REVISE method. We evaluate how different factors affect CE quality by performing an extensive experimental analysis. Our findings demonstrate that increasing inter-class distance degrades CE quality, particularly explanation plausibility. Data imbalance showed minimal impact, while balancing techniques yielded a slight improvement in CE plausibility, especially for the minority class. Classifiers biased towards specific subgroups resulted in lower CE quality for those subgroups. We observed limited evidence for a consistent amplification effect of decision thresholds. This research utilizes various datasets and classification models, including the novel NeuroTree model. Our findings contribute to XAI by providing insights into factors affecting CE quality and highlighting areas for future development, particularly regarding fairness and handling imbalanced data. ...

Finding Recourse for Algorithmic Recourse

Actionable Recommendations in Real-World Contexts

Master thesis (2024) - A.J. Buszydlik, C.C.S. Liem, R.I.J. Dobbe, P. Altmeyer, P.K. Murukannaiah

The aim of algorithmic recourse (AR) is generally understood to be the provision of "actionable" recommendations to individuals affected by algorithmic decision-making systems in an attempt to present them with the capacity to take actions that would guarantee more desirable outcomes in the future. Over the past few years, the literature has predominantly focused on the development of solutions to generate "actionable" counterfactual explanations that further satisfy various desiderata, such as diversity or robustness. We believe that algorithmic recourse, by its nature, should be seen as a practical challenge: real-world decision-making systems are complex dynamic entities involving various actors – end users, domain experts, system owners, etc. – engaging in social and technical processes. Thus, research on algorithmic recourse should account for the characteristics of systems where such mechanisms could be implemented. This necessitates a rich understanding of the problem space of AR but, as we observe, it remains largely uncharted in the existing literature.

We focus on algorithmic recourse in real-world contexts, applying Design Science Research methods to bridge the gap between its technical affordances and the social constraints of real-world decision-making systems where it could be applied. First, we conduct a systematized literature review of 127 publications to learn about the authors' perception of the problem. Next, we consider a case study of a risk profiling model developed to support the authorities of a major Dutch city in the detection of welfare fraud. We employ a desk research approach to learn about the system, reinforce our understanding of the requirements for algorithms in public administration settings through interviews with experts, and make use of accident analysis methodologies to theorize about the value of AR interventions in this setting. We draw on these insights to propose a conceptual framework for the evaluation of AR in real-world contexts and provide its proof-of-concept instantiation as a simulation tool that facilitates the study of such mechanisms within decision-making processes. Finally, we design and prove an algorithm to generate actionable recommendations in expert systems. These are commonly used in public administration systems but overlooked in existing research.

On the example of our endeavor, we learn about the ways to strengthen the connections between the problem space and the solution space of algorithmic recourse. We argue that AR can be discussed on three levels of complexity: (1) as actionable recommendations, (2) as the process of improving outcomes, or (3) as the task of developing mechanisms to support end-users in this process. We advocate for computer science authors to focus on the final, broadest meaning of the challenge to improve the applicability of their solutions in real-world contexts. We also encourage researchers from other fields to contribute their perspectives and for practitioners to support further research by building upon our approach to reason about the place for AR solutions in their domains of expertise. ...

The aim of algorithmic recourse (AR) is generally understood to be the provision of "actionable" recommendations to individuals affected by algorithmic decision-making systems in an attempt to present them with the capacity to take actions that would guarantee more desirable outcomes in the future. Over the past few years, the literature has predominantly focused on the development of solutions to generate "actionable" counterfactual explanations that further satisfy various desiderata, such as diversity or robustness. We believe that algorithmic recourse, by its nature, should be seen as a practical challenge: real-world decision-making systems are complex dynamic entities involving various actors – end users, domain experts, system owners, etc. – engaging in social and technical processes. Thus, research on algorithmic recourse should account for the characteristics of systems where such mechanisms could be implemented. This necessitates a rich understanding of the problem space of AR but, as we observe, it remains largely uncharted in the existing literature.

We focus on algorithmic recourse in real-world contexts, applying Design Science Research methods to bridge the gap between its technical affordances and the social constraints of real-world decision-making systems where it could be applied. First, we conduct a systematized literature review of 127 publications to learn about the authors' perception of the problem. Next, we consider a case study of a risk profiling model developed to support the authorities of a major Dutch city in the detection of welfare fraud. We employ a desk research approach to learn about the system, reinforce our understanding of the requirements for algorithms in public administration settings through interviews with experts, and make use of accident analysis methodologies to theorize about the value of AR interventions in this setting. We draw on these insights to propose a conceptual framework for the evaluation of AR in real-world contexts and provide its proof-of-concept instantiation as a simulation tool that facilitates the study of such mechanisms within decision-making processes. Finally, we design and prove an algorithm to generate actionable recommendations in expert systems. These are commonly used in public administration systems but overlooked in existing research.

On the example of our endeavor, we learn about the ways to strengthen the connections between the problem space and the solution space of algorithmic recourse. We argue that AR can be discussed on three levels of complexity: (1) as actionable recommendations, (2) as the process of improving outcomes, or (3) as the task of developing mechanisms to support end-users in this process. We advocate for computer science authors to focus on the final, broadest meaning of the challenge to improve the applicability of their solutions in real-world contexts. We also encourage researchers from other fields to contribute their perspectives and for practitioners to support further research by building upon our approach to reason about the place for AR solutions in their domains of expertise.

A counterfactual-based evaluation framework for machine learning models that use gene expression data

Master thesis (2024) - M.E. Radder, C.C.S. Liem, P. Altmeyer, Thomas Abeel

The evaluation metrics commonly used for machine learning models often fail to adequately reveal the inner workings of the models, which is particularly necessarily in critical fields like healthcare. Explainable AI techniques, such as counterfactual explanations, offer a way to uncover a model’s internal process. However, these explanations are in literature often used for recourse actions rather than for testing a model’s internal mechanism. In this paper, we propose a proof of concept for a framework which uses counterfactual explanation to evaluate the inner workings of biological machine learning models that use gene expression data. Our approach involves comparing the change of gene expression observed in the original data to the change of gene expression observed between the factual and counterfactual data. The change of gene expression is quantified using the log fold change. Additionally, we expand the definition of faithfulness and introduce a new metric that measures how faithful the generated counterfactual explanations represent the model. This metric should ensure that the explanations accurately reflect the model’s true internal process. ...

Quantifying the Endogenous Domain and Model Shifts Induced by the CLUE Recourse Generator

Bachelor thesis (2022) - K.T. Dobiczek, C.C.S. Liem, P. Altmeyer, M.A. Migut

Employing counterfactual explanations in a recourse process gives a positive outcome to an individual, but it also shifts their corresponding data point. For systems where models are updated frequently, a change might be seen when recourse is applied, and after multiple rounds, severe shifts in both model and domain may occur. Algorithmic recourse frameworks such as CARLA compare the counterfactual generators based on the effectiveness and cost of employing recourse, but little to no previous work has been done on analyzing the shifts in dynamics of the systems. In this paper, we propose a set of metrics aimed at measuring shifts in the domains and models employed in those systems, as well as an experiment framework built on top of CARLA. These metrics allow us to analyze experimentally the characteristics of shifts in dynamics induced by the CLUE and Wachter generators. ...

The endogenous dynamics induced by Algorithmic Recourse

Bachelor thesis (2022) - G.J.A. Angela, C.C.S. Liem, P. Altmeyer, M.A. Migut

Machine learning classifiers have become a household tool for banks, companies, and government institutes for automated decision-making. In order to help explain why a person was classified a certain way, a solution was proposed that could generate these counterfactual explanations. Several generators have been introduced and tested but include several side effects. One of these side effects is making it easier to be classified incorrectly after sufficient recourse has been applied. Dynamics, a.k.a. shifts in both the domain and model, cause these side effects. We aimed to quantify these dynamics induced by two generators, Wachter et al. and REVISE, and compare them against each other. We performed three experiments with both generators and looked at the effect a different dataset, model, or hyper-parameter may have had on the dynamics. We found that REVISE induces a slight model shift while the domain shifts increase with each round of recourse.
...

Quantifying the Endogenous Domain and Model Shifts Induced by the DiCE Generator

Bachelor thesis (2022) - A.J. Buszydlik, C.C.S. Liem, P. Altmeyer, M.A. Migut

Algorithmic recourse aims to provide individuals affected by a negative classification outcome with actions which, if applied, would flip this outcome. Various approaches to the generation of recourse have been proposed in the literature; these are typically assessed on statistical measures such as the validity of generated explanations or their proximity to the training data. However, little to no attention has been paid to the underlying dynamics of recourse. If a group of individuals applies the suggested actions, they may over time induce a shift in the domain or model. We propose a framework for the measurement of such intrinsic shifts, and conduct an analysis of the dynamics of recourse implemented by the generators proposed by Mothilal et al. and Wachter et al.. Our results suggest that the application of recourse is likely to introduce statistically significant shifts in the system, and that the underlying dataset and model impact the behavior of the generators. ...