Instance Attribution in Information Retrieval

Identifying and Selecting Influential Instances with Instance Attribution for Passage Re-Ranking

Master thesis (2023)

Authors

I.S. Hacipoglu Electrical Engineering, Mathematics and Computer Science

Contributors

A. Anand Web Information Systems - (mentor)

Max Idahl Leibniz University Hannover (coach)

S.E. Verwer Cyber Security - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:9ce89b5c-4b97-4ae0-bc57-cb9ef29b88bb

Published Date

24-10-2023

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

The complexity of deep neural rankers and large datasets make it increasingly more challenging to understand why a document is predicted as relevant to a given query. A growing body of work focuses on interpreting ranking models with different explainable AI methods. Instance attribution methods aim to explain individual predictions of machine learning models by identifying influential training data. However, despite their popularity, instance attribution methods are largely unexplored in the information retrieval context, particularly in text ranking. This thesis introduces an application of TracInCP, an instance attribution method, to infer the influence of query-passage training data on ranking model predictions. We propose and evaluate training data subset selection approaches based on influence. By analyzing patterns in influential examples, we find common query and passage characteristics in the training data that affect the model’s ranking decisions. Finally, we demonstrate possible challenges in using instance attribution to create smaller datasets for text ranking tasks.

Files

Shacipoglu_masters_thesis.pdf

(.pdf | 3.26 Mb)