Continuous Distributions and Measures of Statistical Accuracy for Structured Expert Judgment

Journal Article (2025)
Author(s)

Guus Rongen (Pattle Delamore Partners Ltd.)

Gabriela F. Nane (TU Delft - Applied Probability)

Oswaldo Morales-Napoles (TU Delft - Hydraulic Structures and Flood Risk)

Roger M. Cooke (TU Delft - Applied Probability, Resources for the Future)

DOI related publication
https://doi.org/10.1002/ffo2.70009 Final published version
More Info
expand_more
Publication Year
2025
Language
English
Journal title
Futures and Foresight Science
Issue number
2
Volume number
7
Article number
e70009
Downloads counter
174
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This study evaluates five scoring rules, or measures of statistical accuracy, for assessing uncertainty estimates from expert judgment studies and model forecasts. These rules — the Continuously Ranked Probability Score ((Formula presented.)), Kolmogorov-Smirnov ((Formula presented.)), Cramer-von Mises ((Formula presented.)), Anderson Darling ((Formula presented.)), and chi-square test — were applied to 6864 expert uncertainty estimates from 49 Classical Model (CM) studies. We compared their sensitivity to various biases and their ability to serve as performance-based weight for expert estimates. Additionally, the piecewise uniform and Metalog distribution were evaluated for their representation of expert estimates because four of the five rules require interpolating the experts' estimates. Simulating biased estimates reveals varying sensitivity of the considered test statistics to these biases. Expert weights derived using one measure of statistical accuracy were evaluated with other measures to assess their performance. The main conclusions are (1) (Formula presented.) overlooks important biases, while chi-square and (Formula presented.) behave similarly, as do (Formula presented.) and (Formula presented.). (2) All measures except (Formula presented.) agree that performance weighting is superior to equal weighting with respect to statistical accuracy. (3) Neither distributions can effectively predict the position of a removed quantile estimate. These insights show the behavior of different scoring rules for combining uncertainty estimates from expert or models, and extent the knowledge for best-practices.