The impact of the semantic matching within interpolation-based re-ranking

None, None

The impact of the semantic matching within interpolation-based re-ranking

Bachelor Thesis (2024)

Author(s)

A. Nistor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Anand – Mentor (TU Delft - Web Information Systems)

L.J.L. Leonhardt – Mentor (TU Delft - Web Information Systems)

A Hanjalic – Graduation committee member (TU Delft - Intelligent Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Efficiency Information retrieval Latency Ranking IR Dual-encoders

To reference this document use:

https://resolver.tudelft.nl/uuid:bca3254c-88e1-4285-9222-ab81afa4daac

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

28-06-2024

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The crucial role of information retrieval (IR) is highlighted by its presence across a wide range of tasks, such as web search and fact-checking, and domains, including finance and healthcare. Effective and efficient IR systems are critical for finding relevant information from vast amounts of data. Traditional sparse retrieval methods such as BM25 are efficient but often fail to capture the context, while more recent dense retrieval models are highly inefficient in terms of resources and latency.
In our research, we evaluate multiple Transformer-based models to understand the impact of the semantic re-ranking phase within interpolation-based re-ranking, using the FAST-FORWARD indexes framework, a retrieve-and-re-rank approach which combines the benefits of both lexical and semantic matching. We focused on identifying specific scenarios in which particular models excel in terms of ranking performance or latency, aiming to provide model recommendations tailored to different settings. Our evaluations reveal that no single model outperforms others across all datasets. We hypothesise that the main factors influencing encoder performance are the datasets used for finetuning and the method employed for computing the contextualised vector embedding. However, ablations studies would be beneficial for validating these observations.

Files

CSE3000_FinalPaper_AlexandruNi... (pdf)

(pdf | 1.81 Mb)

License info not available