This paper presents a unified evaluation framework for assessing multimodal storytelling robots used in dementia care. Dementia increasingly affects the quality of life of older adults, and co-creative storytelling with social robots has shown promise in supporting social engagem
...
This paper presents a unified evaluation framework for assessing multimodal storytelling robots used in dementia care. Dementia increasingly affects the quality of life of older adults, and co-creative storytelling with social robots has shown promise in supporting social engagement and emotional well-being. However, existing evaluations often overlook whether generated content fairly reflects the contributions of people with dementia (PwD). To address this, a framework is proposed that jointly evaluates the accuracy of textual, visual, and audio outputs to the original input and their emotional coherence. The method incorporates alignment metrics (AlignScore and BERTScore) for text, image relevance (VQAScore), and audio emotion analysis (valence-arousal), as well as speaker attribution to ensure equitable representation. Results from experimental sessions show that data biases can be quantitatively identified and correlated with user enjoyment indicators. These findings offer a scalable approach to evaluating storytelling robots, ensuring both therapeutic benefit and respect for user identity in sensitive care contexts.