The impact of the semantic matching within interpolation-based re-ranking

Bachelor Thesis (2024)
Author(s)

A. Nistor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Anand – Mentor (TU Delft - Web Information Systems)

L.J.L. Leonhardt – Mentor (TU Delft - Web Information Systems)

A Hanjalic – Graduation committee member (TU Delft - Intelligent Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
28-06-2024
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The crucial role of information retrieval (IR) is highlighted by its presence across a wide range of tasks, such as web search and fact-checking, and domains, including finance and healthcare. Effective and efficient IR systems are critical for finding relevant information from vast amounts of data. Traditional sparse retrieval methods such as BM25 are efficient but often fail to capture the context, while more recent dense retrieval models are highly inefficient in terms of resources and latency.
In our research, we evaluate multiple Transformer-based models to understand the impact of the semantic re-ranking phase within interpolation-based re-ranking, using the FAST-FORWARD indexes framework, a retrieve-and-re-rank approach which combines the benefits of both lexical and semantic matching. We focused on identifying specific scenarios in which particular models excel in terms of ranking performance or latency, aiming to provide model recommendations tailored to different settings. Our evaluations reveal that no single model outperforms others across all datasets. We hypothesise that the main factors influencing encoder performance are the datasets used for finetuning and the method employed for computing the contextualised vector embedding. However, ablations studies would be beneficial for validating these observations.

Files

License info not available