The Utility of Query Expansion for Semantic Re-ranking Models

Ghita, V.

The Utility of Query Expansion for Semantic Re-ranking Models

An empirical analysis on the performance impact for ad-hoc retrieval

Bachelor thesis (2024)

Authors

V. Ghita Electrical Engineering, Mathematics and Computer Science

Contributors

Jurek Leonhardt Web Information Systems (mentor)

A. Anand Web Information Systems (mentor)

Alan Hanjalic Intelligent Systems (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Information Retrieval Query Expansion Ad-hoc Retrieval Semantic Re-ranking

To reference this document use:

http://resolver.tudelft.nl/uuid:62b043a9-a770-422a-a35d-a92c82eadc3e

More Info

expand_more

Published Date

28-06-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

In the past years, data has become increasingly important to more and more domains, leading to more efficient decision-making. As the amount of collected data grows, there is an increased need for tools that help with various Information Retrieval (IR) tasks. One of the most widespread IR tasks is ad-hoc retrieval which, for a given search query, returns a list of relevant documents from a large corpus ordered by their relevance. Initial Ad-hoc retrieval models were based on term matching, which could not overcome vocabulary mismatch. On one hand, initial strategies aiming to overcome semantic limitations were adopting query expansion, augmenting the initial search query with new terms to capture more relevant documents. On the other hand, newer strategies rely on Natural Language Processing (NLP) for ranking documents by semantic similarity. One such example is retrieve-and-re-rank models, which retrieve documents by their lexical similarity and re-rank the retrieved documents based on semantic similarities, by making use of NLP embedding models. This research focuses on analysing the performance of combining RM3, a pseudo-relevance feedback query expansion strategy, with the semantic re-ranking model TCT-ColBERT. This model is compared with the lexical retrieval model BM25 which serves as a benchmark, as well as with its components RM3 and TCT-ColBERT. Results indicate that on certain tasks, the model performs better (up to 7%), while on other tasks it performs worse (up to 3%).

Files

Utility_of_QE_for_semantic_re-... (pdf)

(pdf | 0.579 Mb)

License info not available