The Definition of a New Correlation Variant for Rankings With Ties

Exploratory Definitions of the w-variant in τ, τAP, τh

Bachelor Thesis (2025)
Author(s)

M.J. Gazeel (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Julián Urbano – Graduation committee member (TU Delft - Multimedia Computing)

E.A. Markatou – Graduation committee member (TU Delft - Cyber Security)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
24-06-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Rankings are simply orderings given to a set of elements; They are a widely used mathematical object in information retrieval. This creates the need for some means of comparing them. Rank Similarity Measures are used exactly for this. They constitute a large research area where many different such measures are defined. A ranking may possibly contain ties. This in turn raises the question of what these ties represent and how to treat them in the calculation of a measure. The treatment of ties in current theory is approached with the a and b variants of the measures. Both a and b stem from a statistical approach to ties; they consider tied elements to represent uncertainty about their real order in the ranking. There is, however, a different interpretation of what ties could represent, namely that the tied elements really occur at the same place in the ranking, that is, there is no intrinsic order in which they should appear. This has been considered in one of the nonconjoint measures and has been coined the w-variant after Weber et al. In this work, we consider the problem of defining this very variant for a family of three commonly used ranking similarity measures, these being τ defined by Kendall, τAP defined by Yilmaz et al., and τh defined by Vigna. We approach this problem by establishing what the variant should represent and defining a set of axioms that any definition of w has to follow. Thereafter, we show that there is only one definition which can possibly satisfy these, with a small exception. We show that this definition coincides with the distance considered by Kemeny in 1959. We use this to create a definition of the w-variant for all three of the measures. Likewise, we investigate the behaviour of this new variant in relation to the existing a and b variants. Moreover, we identify the shortcomings of our definition and evaluate it on real world data. Finally, we lay the groundwork for rigorously proving parts of our definition and other measures which may consider ties to represent occurrence at the same rank.

Files

Paper.pdf
(pdf | 0.701 Mb)
License info not available