The Definition of a New Correlation Variant for Rankings With Ties

None, None

The Definition of a New Correlation Variant for Rankings With Ties

Exploratory Definitions of the w-variant in τ, τ_AP, τ_h

Bachelor Thesis (2025)

Author(s)

M.J. Gazeel (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Julián Urbano – Graduation committee member (TU Delft - Multimedia Computing)

E.A. Markatou – Graduation committee member (TU Delft - Cyber Security)

Faculty

Electrical Engineering, Mathematics and Computer Science

Ties Rankings Variant Rank similarity Kendall Tau Sports Rankings Rank Correlation Coefficient

To reference this document use:

https://resolver.tudelft.nl/uuid:70a02fa8-3db6-4740-90a9-7402f7153ba0

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

24-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Rankings are simply orderings given to a set of elements; They are a widely used mathematical object in information retrieval. This creates the need for some means of comparing them. Rank Similarity Measures are used exactly for this. They constitute a large research area where many different such measures are defined. A ranking may possibly contain ties. This in turn raises the question of what these ties represent and how to treat them in the calculation of a measure. The treatment of ties in current theory is approached with the a and b variants of the measures. Both a and b stem from a statistical approach to ties; they consider tied elements to represent uncertainty about their real order in the ranking. There is, however, a different interpretation of what ties could represent, namely that the tied elements really occur at the same place in the ranking, that is, there is no intrinsic order in which they should appear. This has been considered in one of the nonconjoint measures and has been coined the w-variant after Weber et al. In this work, we consider the problem of defining this very variant for a family of three commonly used ranking similarity measures, these being τ defined by Kendall, τ_AP defined by Yilmaz et al., and τ_h defined by Vigna. We approach this problem by establishing what the variant should represent and defining a set of axioms that any definition of w has to follow. Thereafter, we show that there is only one definition which can possibly satisfy these, with a small exception. We show that this definition coincides with the distance considered by Kemeny in 1959. We use this to create a definition of the w-variant for all three of the measures. Likewise, we investigate the behaviour of this new variant in relation to the existing a and b variants. Moreover, we identify the shortcomings of our definition and evaluate it on real world data. Finally, we lay the groundwork for rigorously proving parts of our definition and other measures which may consider ties to represent occurrence at the same rank.

Files

Paper.pdf

(pdf | 0.701 Mb)

License info not available

The Definition of a New Correlation Variant for Rankings With Ties

Exploratory Definitions of the w-variant in τ, τAP, τh

Abstract

Files

Exploratory Definitions of the w-variant in τ, τ_AP, τ_h