The Utility of Query Expansion for Semantic Re-ranking Models

An empirical analysis on the performance impact for ad-hoc retrieval

More Info
expand_more

Abstract

In the past years, data has become increasingly important to more and more domains, leading to more efficient decision-making. As the amount of collected data grows, there is an increased need for tools that help with various Information Retrieval (IR) tasks. One of the most widespread IR tasks is ad-hoc retrieval which, for a given search query, returns a list of relevant documents from a large corpus ordered by their relevance. Initial Ad-hoc retrieval models were based on term matching, which could not overcome vocabulary mismatch. On one hand, initial strategies aiming to overcome semantic limitations were adopting query expansion, augmenting the initial search query with new terms to capture more relevant documents. On the other hand, newer strategies rely on Natural Language Processing (NLP) for ranking documents by semantic similarity. One such example is retrieve-and-re-rank models, which retrieve documents by their lexical similarity and re-rank the retrieved documents based on semantic similarities, by making use of NLP embedding models. This research focuses on analysing the performance of combining RM3, a pseudo-relevance feedback query expansion strategy, with the semantic re-ranking model TCT-ColBERT. This model is compared with the lexical retrieval model BM25 which serves as a benchmark, as well as with its components RM3 and TCT-ColBERT. Results indicate that on certain tasks, the model performs better (up to 7%), while on other tasks it performs worse (up to 3%).