Performance of Large Language Models in Prediction Markets

None, None

Performance of Large Language Models in Prediction Markets

Master Thesis (2026)

Author(s)

K.H. Halldórsson (TU Delft - Technology, Policy and Management)

Contributor(s)

A.Y. Ding – Mentor

S. Renes – Mentor (TU Delft - Economics of Technology and Innovation)

R. van Bergem – Mentor (TU Delft - Economics of Technology and Innovation)

Faculty

Technology, Policy and Management

Large Language Models LLM Decision Making Prediction Markets Information aggregation

To reference this document use

https://resolver.tudelft.nl/uuid:6add2e07-a494-442a-8e6a-ac352db40b19

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

27-02-2026

Awarding Institution

Delft University of Technology

Programme

Management of Technology (MoT)

Faculty

Technology, Policy and Management

Downloads counter

110

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Decision-making under uncertainty often relies on access to accurate probabilistic forecasts. In many contexts, such forecasts are scarce or difficult to obtain. Decentralised prediction markets are widely regarded as effective tools to aggregate dispersed information that decision-makers can use as forecasts to make better decisions about an uncertain future. Recently, there have been major advances in large language models that have led to claims that large language models could complement or replace market-based forecasting by synthesising information without the need for incentive-driven markets. However, there is limited empirical evidence comparing forecasts generated by large language models to market-based aggregation under real-world conditions. This thesis puts these claims to the test by examining the extent to which large language models can replicate or complement human forecasting as reflected in decentralised prediction markets. Using Polymarket as a benchmark for collective human forecasting, probability forecasts generated by large language models are compared to live market probabilities. Forecasting performance is evaluated across different market conditions, and the decision-making relevance of forecasts generated by large language models is evaluated through trading simulations. The results show that market probabilities are consistently more accurate than the forecasts generated by large language models in terms of predictive accuracy. The findings hold across all evaluated models, model combinations, prompting strategies, market stages, and liquidity levels of markets. A regression-based aggregation model that mixes market probabilities and large language model forecasts achieves predictive performance comparable to that of the market in some cases, but it fails to generalise when put to the test under realistic conditions. The findings suggest that large language models at their current stage cannot substitute prediction markets as information aggregation mechanisms. The results challenge claims that large language models can replicate the performance of prediction markets in the generation of accurate probabilistic forecasts. The results highlight the need for caution when deploying large language models in the context of high-stakes decision-making.

https://github.com/KetillHafdal/llm-vs-prediction-markets

Files

Performance_of_Large_Language_... (pdf)

(pdf | 2.5 Mb)

License info not available