A Probabilistic Account of the Uncertainty Due to Ties in Rank-Biased Overlap

Efficient Estimation of the Uncertainty Distribution for Tied Data

Bachelor Thesis (2025)
Author(s)

L. Chládek (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J. Urbano Merino – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

E.A. Markatou – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
27-06-2025
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
117
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Rank similarity quantifies the difference between two ordered sets of items. Rank-Biased Overlap (RBO) is a top-weighted measure of rank similarity that can be used for a pair of indefinite rankings, such that only a prefix is known and that items need not be present in both rankings. This method is frequently used in Information Retrieval (IR), such as to compare search engine results. RBO defines tight lower and upper bounds, RBO_min and RBO_max, which give the uncertainty due to items in the unseen suffix. Another source of uncertainty are ties: two items are tied in a ranking if their true order is not known. Recent work on the treatment of ties in RBO has made it a tie-aware measure. However, unlike the uncertainty due to unseen items, uncertainty due to ties does not disappear for longer prefixes. Determining the distribution of possible scores is O((n!)^2) if all arrangements of ties are considered, and existing methods only find the lower and upper bound for RBO with respect to ties. We investigate whether a probabilistic estimator for the uncertainty distribution can be constructed. We use an iterative convolution method to compose the marginal PMFs of each item. By evaluating against synthetic data, we show that this estimate distribution can be used to reliably compute confidence intervals, mean, and variance. We conclude that a probabilistic method is a viable solution when seeking deterministic results with fast computation.

Files

RP.pdf
(pdf | 0.556 Mb)
License info not available