Evaluating Metric Sensitivity to Offline–Online Alignment in Information Retrieval

Bachelor Thesis (2026)
Author(s)

S. Udagawa (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Anand – Mentor (TU Delft - Web Information Systems)

J. Urbano Merino – Mentor (TU Delft - Multimedia Computing)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
30-01-2026
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This study examines how effectively widely used offline information retrieval (IR) metrics reflect changes in online performance. As offline evaluation plays a central role in model development, understanding its alignment with user‑oriented signals is essential. Using 52 diverse ranking pipelines and approximately 2,000 queries from the MS MARCO DL19 and DL20 benchmarks, we analyze the sensitivity of five offline metrics: Precision@10, Recall@10, MAP, MRR, and NDCG@10, to five simulated online metrics: CTR, SSR, ZRR, ADT, and SAR. Sensitivity is quantified through slope-based analysis, and alignment is assessed using the Pearson correlation coefficient. Our results show that NDCG@10 and Recall@10 are the most sensitive offline metrics across multiple online behaviors, while Precision@10 consistently exhibits low sensitivity. Furthermore, we demonstrate that sensitivity and alignment capture complementary aspects of offline–online relationships: some metric pairs show strong responsiveness but weak linear consistency. Overall, this study provides a detailed and reproducible evaluation of how offline metrics behave in relation to simulated online performance, offering practical guidance for selecting offline metrics that better reflect user-centric outcomes.

https://github.com/AinzOoalGown123/Metric-Sensitivity-Analysis

Files

CSE3000_Final_Paper.pdf
(pdf | 4.59 Mb)
License info not available