Continuous Distributions and Measures of Statistical Accuracy for Structured Expert Judgment

Journal Article (2025)
Author(s)

Guus Rongen (Pattle Delamore Partners Ltd.)

Gabriela Florentina Nane (TU Delft - Applied Probability)

NO Morales-Nápoles (TU Delft - Hydraulic Structures and Flood Risk)

RM Cooke (TU Delft - Applied Probability, Resources for the Future)

Research Group
Applied Probability
DOI related publication
https://doi.org/10.1002/ffo2.70009
More Info
expand_more
Publication Year
2025
Language
English
Research Group
Applied Probability
Issue number
2
Volume number
7
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This study evaluates five scoring rules, or measures of statistical accuracy, for assessing uncertainty estimates from expert judgment studies and model forecasts. These rules — the Continuously Ranked Probability Score ((Formula presented.)), Kolmogorov-Smirnov ((Formula presented.)), Cramer-von Mises ((Formula presented.)), Anderson Darling ((Formula presented.)), and chi-square test — were applied to 6864 expert uncertainty estimates from 49 Classical Model (CM) studies. We compared their sensitivity to various biases and their ability to serve as performance-based weight for expert estimates. Additionally, the piecewise uniform and Metalog distribution were evaluated for their representation of expert estimates because four of the five rules require interpolating the experts' estimates. Simulating biased estimates reveals varying sensitivity of the considered test statistics to these biases. Expert weights derived using one measure of statistical accuracy were evaluated with other measures to assess their performance. The main conclusions are (1) (Formula presented.) overlooks important biases, while chi-square and (Formula presented.) behave similarly, as do (Formula presented.) and (Formula presented.). (2) All measures except (Formula presented.) agree that performance weighting is superior to equal weighting with respect to statistical accuracy. (3) Neither distributions can effectively predict the position of a removed quantile estimate. These insights show the behavior of different scoring rules for combining uncertainty estimates from expert or models, and extent the knowledge for best-practices.