Causal Probing for Dual Encoders

None, None; None, None; None, None

Causal Probing for Dual Encoders

Conference Paper (2024)

Author(s)

Jonas Wallat (L3S)

Hauke Hinrichs (L3S)

A. Anand (TU Delft - Web Information Systems)

Research Group

Web Information Systems

DOI related publication

https://doi.org/10.1145/3627673.3679556

Interpretability Information retrieval Probing Language models

To reference this document use:

https://resolver.tudelft.nl/uuid:98d7840e-a477-4e8f-90ae-9ccb74246f3b

More Info

expand_more

Publication Year

2024

Language

English

Research Group

Web Information Systems

Pages (from-to)

2292-2303

ISBN (electronic)

9798400704369

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Dual encoders are highly effective and widely deployed in the retrieval phase for passage and document ranking, question answering, or retrieval-augmented generation (RAG) setups. Most dual-encoder models use transformer models like BERT to map input queries and output targets to a common vector space encoding the semantic similarity. Despite their prevalence and impressive performance, little is known about the inner workings of dense encoders for retrieval. We investigate neural retrievers using the probing paradigm to identify well-understood IR properties that causally result in ranking performance. Unlike existing works that have probed cross-encoders to show query-document interactions, we provide a principled approach to probe dual-encoders. Importantly, we employ causal probing to avoid correlation effects that might be artefacts of vanilla probing. We conduct extensive experiments on one such dual encoder (TCT-ColBERT) to check for the existence and relevance of six properties: term importance, lexical matching (BM25), semantic matching, question classification, and the two linguistic properties of named entity recognition and coreference resolution. Our layer-wise analysis shows important differences between re-rankers and dual encoders, establishing which tasks are not only understood by the model but also used for inference.

Files

3627673.3679556.pdf

(pdf | 2.82 Mb)

- Embargo expired in 15-05-2025

License info not available