Analysis of the Prediction Tournament Paradox

None, None

Analysis of the Prediction Tournament Paradox

Bachelor Thesis (2025)

Author(s)

V.M. van der Eng (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J. Söhl – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

H.M. Schuttelaars – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Paradox Proper scoring rules Prediction tournament

To reference this document use

https://resolver.tudelft.nl/uuid:512ec4c2-dd44-4dfb-9a89-cb413a5bbee3

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

25-06-2025

Awarding Institution

Delft University of Technology

Programme

Applied Mathematics

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

124

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In a prediction tournament, contestants are tasked with predicting the distribution of a random variable. To determine which contestant makes the most accurate predictions, scores are assigned based on the outcomes of the random variables. The scoring rules are designed such that a contestant’s expected score decreases as their predicted values approach the true distribution. This implies that the contestant with the lowest score should be the most accurate predictor. However, simulation results show that this is not the case. In this report, we found that for the common case of Bernoulli random variables, the true success probabilities affect the distribution of winners: it has a positive effect when the probability is closer to 0 or 1, and a negative effect when it is near 0.5. We also found that this distribution is not affected by whether contestant errors are drawn from a continuous distribution with fixed variance σ² or are simply +σ or −σ. Furthermore, contestants who make extreme predictions (always predicting 0 or 1) do not outperform those who predict values close to the true success probability. While the choice of scoring rule does influence the distribution of winners, it does not eliminate the paradox. We found that the the Pseudospherical and Power score with parameter β close to 1, and the Logarithmic score performed the best. We extend our analysis to random variables with multiple categories. To support this extension, we introduce a new sampling method that builds on the one used in earlier simulations. In the binary model, we only needed one success probability for each random variable, but now we need multiple per random variable, while making sure that the sum of all the probabilities is exactly 1. Using a statistical distance, we determine how to model contestant predictions. For these random variables, we also analyze various scoring rules. In this case, we found that both the Pseudospherical score and the Power score, with β slightly larger than 1, and the Logarithmic score performed the best across various numbers of categories. Similarly, we extend our analysis to continuous random variables. Because of time constraints, we only look at Normal distributions with known variance. We use the same statistical distance as for the multi-categorical random variables, the total variation distance, to determine how to model contestant predictions. We again look at several scoring rules and found that the Power and Pseudospherical scoring rules for values of β close to 1 and the Logarithmic score, performed the best in this scenario.

Files

Final_report.pdf

(pdf | 9.32 Mb)

License info not available