Quantization for compact neural passage re-ranking

More Info
expand_more

Abstract

Passage re-ranking is a fundamental problem in information retrieval, which deals with reordering a small set of passages based on their relevancy to a query. It is a crucial component in various web information systems, such as search engines or question-answering systems. Modern approaches for building re-ranking systems rely on neural language models such as BERT, or its derivatives, to create dense indexes for the target document corpus. While such approaches bring significant performance gains compared to classical lexical re-rankers, they have the disadvantage of increased memory costs.

A family of methods that can be used to reduce the memory footprint of a dense index is called vector quantization. Vector quantization algorithms usually rely on a combination of clustering and space manipulation operations to perform a lossy compression of the dense index at the expense of index performance. While vector quantization is widely used for first-stage retrieval, its use in the context of re-ranking is underexplored. To this end, this thesis evaluates the effectiveness of product quantization, a well-known vector quantization method, on single-vector dual-encoders, specifically TCT-ColBERT and Aggretriever. In addition to this, we show how linear interpolation of sparse scores can be leveraged to improve the performance of quantized dense indices with negligible costs to the memory footprint or speed. Last but not least, we propose WolfPQ, a learnable quantization method aimed at further improving quantization for re-ranking by bridging the gap between the objective functions used in training product quantization and re-ranking systems, respectively.