Pose estimation models predict multiple interdependent body keypoints, making them a prototypical example of multi-target tasks in machine learning. While existing explainable AI (XAI) techniques have advanced our ability to interpret model outputs in single-target domains, their
...
Pose estimation models predict multiple interdependent body keypoints, making them a prototypical example of multi-target tasks in machine learning. While existing explainable AI (XAI) techniques have advanced our ability to interpret model outputs in single-target domains, their application to structured outputs remains underdeveloped. This work investigates how XAI methods can be adapted to explain pose estimation models, particularly in the context of cricket shot analysis. Guided by three research questions, we identify key challenges such as capturing inter-keypoint dependencies and providing interpretable explanations of structured outputs. We analyze both geometric and heatmap-level behavior of a pretrained pose estimation model to distinguish between two cricket shots - the pull and the cover drive. Through techniques like cosine similarity on heatmaps and polynomial trajectory modeling, we reveal how the model internally differentiates between similar motion patterns. Our framework introduces novel techniques for inter-keypoint explanation, contributes domain-specific insights into model behavior, and demonstrates the feasibility of interpretable structured predictions in high-dimensional, real-world tasks.