Medical applications often involve several data modalities, particularly medical images and clinical information, which can be combined to enhance the decision-making process by improving accuracy. Multimodal learning approaches can leverage all available data for increased robus
...
Medical applications often involve several data modalities, particularly medical images and clinical information, which can be combined to enhance the decision-making process by improving accuracy. Multimodal learning approaches can leverage all available data for increased robustness in the resulting models, consequently outperforming unimodal approaches. Furthermore, AI frameworks must be human-verifiable and interpretable to be deployed in real-world situations, considering legal and privacy aspects. Due to the opaque nature of Deep Learning (DL) methods, interpretability is often limited despite their state-of-the-art performance in many tasks. Genetic Programming (GP) can provide compact and interpretable symbolic expressions for tabular data but is less effective for image analysis. We introduce MultiFIX: a new interpretability-focused pipeline for multimodal learning that leverages the strengths of DL and GP to explicitly engineer features from different data types and combine them to make the final prediction. The MultiFIX pipeline comprises two stages: the training stage, where a DL (black-box) model is trained using different training procedures to extract relevant features from each modality; and the inference stage, where the resulting model is transformed to be interpretable. Image features are explained with attention maps by Grad-CAM, and inherently interpretable symbolic expressions evolved with GP fully replace the tabular feature engineering block, and the fusion of the extracted features to predict the target label. To show the application potential of the presented pipeline, we demonstrate MultiFIX with a Melanoma Risk Assessment dataset. Results show that MultiFIX outperforms unimodal models while offering explanations that can be straightforwardly analysed and are consistent with the expectations.