Understanding the User

An Intent-Based Ranking Dataset

Conference Paper (2024)
Author(s)

Abhijit Anand (L3S)

L.J.L. Leonhardt (TU Delft - Web Information Systems)

V. Viswanathan (TU Delft - Web Information Systems)

Avishek Anand (TU Delft - Web Information Systems)

Research Group
Web Information Systems
DOI related publication
https://doi.org/10.1145/3627673.3679166
More Info
expand_more
Publication Year
2024
Language
English
Research Group
Web Information Systems
Pages (from-to)
5323-5327
ISBN (electronic)
979-8-4007-0436-9
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

As information retrieval systems continue to evolve, accurate evaluation and benchmarking of these systems become pivotal. Web search datasets, such as MS MARCO, primarily provide short keyword queries without accompanying intent or descriptions, posing a challenge in comprehending the underlying information need. This paper proposes an approach to augmenting such datasets to annotate informative query descriptions, with a focus on two prominent benchmark datasets: TREC-DL-21 and TREC-DL-22. Our methodology involves utilizing state-of-the-art LLMs to analyze and comprehend the implicit intent within individual queries from benchmark datasets. By extracting key semantic elements, we construct detailed and contextually rich descriptions for these queries. To validate the generated query descriptions, we employ crowdsourcing as a reliable means of obtaining diverse human perspectives on the accuracy and informativeness of the descriptions. This information can be used as an evaluation set for tasks such as ranking, query rewriting, or others.