Performance Comparison of Different Query Expansion and Pseudo-Relevance Feedback Methods

A comparison of Bo1, KL, RM3, and Axiomatic Query Expansion against BM25

Bachelor Thesis (2024)
Author(s)

L.J.P. de Swart (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

L.J.L. Leonhardt – Mentor (TU Delft - Web Information Systems)

A. Anand – Mentor (TU Delft - Web Information Systems)

A. Hanjalic – Graduation committee member (TU Delft - Intelligent Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
28-06-2024
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper is an analysis of the performance and logic behind different query expansion models. Query expansion and pseudo relevance feedback are techniques for adding more terms to a query based on the results of an initial query and the data in the body of documents. Four different query expansion models that are provided in the pyterrier python library and its extensions have been analysed, namely Bo1, KL, RM3, and Axiomatic query expansion. It was found that Axiomatic query expansion often does not perform any query expansion, and when it does, has no increase in performance. Bo1 and KL, although different in exact logic, have similar results most of the time. The most significant difference is the execution time, with Bo1 being faster with larger datasets and KL being faster with many documents on smaller datasets. Lastly, RM3 while not having a dominant performance has a lot of potential for good results with the right combination or parameters.

Files

Research_paper-3.pdf
(pdf | 0.191 Mb)
License info not available