A Large-Scale Web Search Dataset for Federated Online Learning to Rank

Conference Paper (2025)
Author(s)

M.I. Gregoriadis (TU Delft - Data-Intensive Systems)

Jingwei Kang (Universiteit van Amsterdam)

J.A. Pouwelse (TU Delft - Data-Intensive Systems)

Research Group
Data-Intensive Systems
DOI related publication
https://doi.org/10.1145/3746252.3761651
More Info
expand_more
Publication Year
2025
Language
English
Research Group
Data-Intensive Systems
Pages (from-to)
6387-6391
ISBN (electronic)
9798400720406
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The centralized collection of search interaction logs for training ranking models raises significant privacy concerns. Federated Online Learning to Rank (FOLTR) offers a privacy-preserving alternative by enabling collaborative model training without sharing raw user data. However, benchmarks in FOLTR are largely based on random partitioning of classical learning-to-rank datasets, simulated user clicks, and the assumption of synchronous client participation. This oversimplifies real-world dynamics and undermines the realism of experimental results. We present AOL4FOLTR, a large-scale web search dataset with ≈ 2.6 million queries from 10,000 users. Our dataset addresses key limitations of existing benchmarks by including user identifiers, real click data, and query timestamps, enabling realistic user partitioning, behavior modeling, and asynchronous federated learning scenarios.