Multimodal machine learning models can exploit complementary information from multiple data modalities. MultiFIX (Multimodal Feature engIneering eXplainable artificial intelligence) is a framework designed to construct partially interpretable multimodal models, providing explanat
...
Multimodal machine learning models can exploit complementary information from multiple data modalities. MultiFIX (Multimodal Feature engIneering eXplainable artificial intelligence) is a framework designed to construct partially interpretable multimodal models, providing explanations for both modality-specific features and each modality its contribution to the final prediction. However, it was shown to not scale effectively for tasks with extreme joint-modality dependence.
This thesis proposes an alternative training strategy that integrates knowledge of the features to be engineered, expressed as feature targets that guide the learning process. The strategy improves upon baseline performance, even when the feature targets are non-ideal. Since ground-truth feature targets are typically unavailable in real-world settings, the feature targets are optimised using the Gene-pool Optimal Mixing Evolutionary Algorithm. The optimised feature targets, though only loosely aligned with the ground-truth features, enables the alternative training method to surpass baseline MultiFIX performance on a three-gated XOR task.
The same approach was evaluated on simpler tasks, such as the single XOR and AND problems, where it achieved slightly lower but still comparable performance to the already strong baselines. Results indicate that this computationally intensive approach is most beneficial for problems characterised by high joint-modality dependence and complex feature interactions. Interestingly, closer alignment between the optimised and ground-truth feature targets did not consistently lead to higher MultiFIX performance. Consequently, future improvements are likely to stem from refining how feature targets are integrated into the training process, rather than from further optimisation of the targets themselves.