A Step towards Interpretable Multimodal AI Models with MultiFIX

None, None; None, None; None, None; None, None

A Step towards Interpretable Multimodal AI Models with MultiFIX

Conference Paper (2025)

Author(s)

Mafalda Malafaia (Centrum Wiskunde & Informatica (CWI))

Thalea Schlender (TU Delft - Algorithmics, Leiden University Medical Center)

Tanja Alderliesten (TU Delft - Algorithmics, Leiden University Medical Center)

Peter Bosman (TU Delft - Algorithmics)

Research Group

Algorithmics

DOI related publication

https://doi.org/10.1145/3712255.3734292

Interpretability Multimodality Genetic Programming

To reference this document use:

https://resolver.tudelft.nl/uuid:7402c600-3b16-4c6d-8fde-f28295b25bab

More Info

expand_more

Publication Year

2025

Language

English

Research Group

Algorithmics

Pages (from-to)

2001-2009

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Real-world problems are often dependent on multiple data modalities, making multimodal fusion essential for leveraging diverse information sources. In high-stakes domains, such as in healthcare, understanding how each modality contributes to the prediction is critical to ensure trustworthy and interpretable AI models. We present MultiFIX, an interpretability-driven multimodal data fusion pipeline that explicitly engineers distinct features from different modalities and combines them to make the final prediction. Initially, only deep learning components are used to train a model from data. The black-box (deep learning) components are subsequently either explained using post-hoc methods such as Grad-CAM for images or fully replaced by interpretable blocks, namely symbolic expressions for tabular data, resulting in an explainable model. We study the use of MultiFIX using several training strategies for feature extraction and predictive modeling. Besides highlighting strengths and weaknesses of MultiFIX, experiments on a variety of synthetic datasets with varying degrees of interaction between modalities demonstrate that MultiFIX can generate multimodal models that can be used to accurately explain both the extracted features and their integration without compromising predictive performance.

Files

3712255.3734292.pdf

(pdf | 2.21 Mb)