A Human-In-the-Loop Framework to Assess Multimodal Machine Learning Models