Evaluating Theory-of-Mind in Large Language Models Through Opponent Modeling
Emre Kuru (Özyeğin University)
Anll Dogru (Özyeğin University)
Merve Dogan (Özyeğin University)
Reyhan Aydogan (Özyeğin University, TU Delft - Interactive Intelligence)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Theory-of-Mind (ToM), the ability to infer the mental states, goals, and preferences of others - is a core component of human social intelligence. In this work, we investigate whether Large Language Models (LLMs) exhibit ToM capabilities in the context of strategic interaction. We frame opponent modeling in negotiation as a grounded and interpretable ToM task, where a model must infer an agent's preferences by observing offer exchanges during the negotiation. We guide LLMs to interpret offer histories and infer latent utility representations, including issue and value weights. We conduct a comprehensive evaluation of state-of-the-art LLMs across multiple negotiation domains. Our results show that LLMs can successfully recover opponents unknown preferences and in some cases even outperform classical opponent modeling baselines, even without task-specific training. These findings offer new evidence of LLMs' emerging capacity for social reasoning and position opponent modeling as a practical benchmark for evaluating Theory-of-Mind in foundation models.