Quantifying Uncertainty due to Ties in Rank Correlation Coefficients

None, None

Quantifying Uncertainty due to Ties in Rank Correlation Coefficients

An algorithmic approach to computing the bounds of uncertainty

Bachelor Thesis (2025)

Author(s)

A. Tsatsanis (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Julián Urbano – Mentor (TU Delft - Multimedia Computing)

Faculty

Electrical Engineering, Mathematics and Computer Science

Ties Uncertainty Rank Correlation Uncertainty quantification Rank similarity measure

To reference this document use:

https://resolver.tudelft.nl/uuid:c02837e4-e809-48a0-91f8-a7ec6f389a25

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

24-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Rank correlation coefficients are a common tool for describing similarity between ordered data. This study examines the use of the popular coefficient Kendall's τ, specifically in the case where the rankings contain tied items that should not be tied. Ties in this case represent uncertainty in the ranking, induced by the system that produced it, usually due to effects such as missing information or loss of precision (rounding). We propose two variants, τ^min and τ^max, which represent the lowest and highest possible correlation over all ways of arbitrating tied items. Our contribution is a novel quadratic-time algorithm for computing an arbitration of ties which yields the extremal correlation values τ^min, τ^max. We formally prove the correctness of the algorithm for the original Kendall's τ, and we suggest an adaptation for weighted variants of τ, such as τ_AP by Yilmaz et al. and τ_h by Vigna. Empirical evaluation on both synthetic ranking pairs and TREC ad-hoc system outputs demonstrates that ties often induce wide intervals [τ^min, τ^max], indicating that no single value can fully encapsulate the uncertainty in correlation. These wide intervals also appear in rankings where current methods of computing τ correlation in presence of ties, namely τ^a and τ^b, have values large enough (≥0.9) for researchers to use as evidence of strong correlation. This indicates that currently used methods may yield false positive results. By reporting τ alongside its uncertainty bounds τ^min and τ^max, researchers are able to make more informed decisions,

by demonstrating the reliability of correlation in presence of uncertainty-induced ties.

Files

Bsc_thesis_andreas_tsatsanis.p... (pdf)

(pdf | 0.808 Mb)

License info not available