Quantifying Uncertainty due to Ties in Rank Correlation Coefficients

An algorithmic approach to computing the bounds of uncertainty

Bachelor Thesis (2025)
Author(s)

A. Tsatsanis (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Julián Urbano – Mentor (TU Delft - Multimedia Computing)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
24-06-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Rank correlation coefficients are a common tool for describing similarity between ordered data. This study examines the use of the popular coefficient Kendall's τ, specifically in the case where the rankings contain tied items that should not be tied.  Ties in this case represent uncertainty in the ranking, induced by the system that produced it, usually due to effects such as missing information or loss of precision (rounding). We propose two variants, τmin and τmax, which represent the lowest and highest possible correlation over all ways of arbitrating tied items. Our contribution is a novel quadratic-time algorithm for computing an arbitration of ties which yields the extremal correlation values τmin, τmax. We formally prove the correctness of the algorithm for the original Kendall's τ, and we suggest an adaptation for weighted variants of τ, such as τAP by Yilmaz et al. and τh by Vigna. Empirical evaluation on both synthetic ranking pairs and TREC ad-hoc system outputs demonstrates that ties often induce wide intervals [τmin, τmax], indicating that no single value can fully encapsulate the uncertainty in correlation. These wide intervals also appear in rankings where current methods of computing τ correlation in presence of ties, namely τa and τb,  have values large enough (≥0.9) for researchers to use as evidence of strong correlation. This indicates that currently used methods may yield false positive results. By reporting τ alongside its uncertainty bounds τmin and τmax, researchers are able to make more informed decisions,
by demonstrating the reliability of correlation in presence of uncertainty-induced ties.

Files

License info not available