Exploring methods to improve effectiveness of ad-hoc retrieval systems for long and complex queries

None, None

Exploring methods to improve effectiveness of ad-hoc retrieval systems for long and complex queries

Bachelor Thesis (2024)

Author(s)

D. Erhan (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

L.J.L. Leonhardt – Mentor (TU Delft - Web Information Systems)

A. Anand – Mentor (TU Delft - Web Information Systems)

A. Hanjalic – Graduation committee member (TU Delft - Intelligent Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Information Retrieval Ad-hoc Retrieval Fast-Forward Index Long queries

To reference this document use:

https://resolver.tudelft.nl/uuid:1c94eeda-9c0e-4122-9652-418ac3127a9c

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

28-06-2024

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Ad-hoc retrieval involves ranking a list of documents from a large collection based on their relevance to a given input query. These retrieval systems often show poorer performances when handling longer and more complex queries. This paper aims to explore methods of improving retrieval effectiveness on these types of queries across different information retrieval (IR) tasks, within the context of Fast-Forward indexes. An analysis is conducted to determine the actual impact of query length and complexity. Interestingly, the hypothesis that longer queries are more challenging does not hold true for all cases, and in some datasets the opposite is true. To improve the performance of long and complex queries, two approaches are explored: utilising multiple dense models during the re-ranking stage instead of the traditional single model and reducing the queries via large language models. The use of multiple dense models for re-ranking proves to be effective, with two models providing the best balance between performance and ranking quality. Utilising LLM's for query reduction achieves performance similar to the original queries but fails to improve their ranking scores.

Files

Research_paper_final.pdf

(pdf | 0.343 Mb)

License info not available