Exploring methods to improve effectiveness of ad-hoc retrieval systems for long and complex queries

Bachelor Thesis (2024)
Author(s)

D. Erhan (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

L.J.L. Leonhardt – Mentor (TU Delft - Web Information Systems)

A. Anand – Mentor (TU Delft - Web Information Systems)

A Hanjalic – Graduation committee member (TU Delft - Intelligent Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
28-06-2024
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Ad-hoc retrieval involves ranking a list of documents from a large collection based on their relevance to a given input query. These retrieval systems often show poorer performances when handling longer and more complex queries. This paper aims to explore methods of improving retrieval effectiveness on these types of queries across different information retrieval (IR) tasks, within the context of Fast-Forward indexes. An analysis is conducted to determine the actual impact of query length and complexity. Interestingly, the hypothesis that longer queries are more challenging does not hold true for all cases, and in some datasets the opposite is true. To improve the performance of long and complex queries, two approaches are explored: utilising multiple dense models during the re-ranking stage instead of the traditional single model and reducing the queries via large language models. The use of multiple dense models for re-ranking proves to be effective, with two models providing the best balance between performance and ranking quality. Utilising LLM's for query reduction achieves performance similar to the original queries but fails to improve their ranking scores.

Files

Research_paper_final.pdf
(pdf | 0.343 Mb)
License info not available