The impact of the semantic matching within interpolation-based re-ranking

More Info
expand_more

Abstract

The crucial role of information retrieval (IR) is highlighted by its presence across a wide range of tasks, such as web search and fact-checking, and domains, including finance and healthcare. Effective and efficient IR systems are critical for finding relevant information from vast amounts of data. Traditional sparse retrieval methods such as BM25 are efficient but often fail to capture the context, while more recent dense retrieval models are highly inefficient in terms of resources and latency.
In our research, we evaluate multiple Transformer-based models to understand the impact of the semantic re-ranking phase within interpolation-based re-ranking, using the FAST-FORWARD indexes framework, a retrieve-and-re-rank approach which combines the benefits of both lexical and semantic matching. We focused on identifying specific scenarios in which particular models excel in terms of ranking performance or latency, aiming to provide model recommendations tailored to different settings. Our evaluations reveal that no single model outperforms others across all datasets. We hypothesise that the main factors influencing encoder performance are the datasets used for finetuning and the method employed for computing the contextualised vector embedding. However, ablations studies would be beneficial for validating these observations.