Instance Attribution in Information Retrieval

Identifying and Selecting Influential Instances with Instance Attribution for Passage Re-Ranking

Master Thesis (2023)
Author(s)

I.S. Hacipoglu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Avishek Anand – Mentor (TU Delft - Web Information Systems)

Max Idahl – Coach (Leibniz University of Hannover)

SE Verwer – Graduation committee member (TU Delft - Cyber Security)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Sara Hacipoğlu
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Sara Hacipoğlu
Graduation Date
24-10-2023
Awarding Institution
Delft University of Technology
Programme
Computer Science
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The complexity of deep neural rankers and large datasets make it increasingly more challenging to understand why a document is predicted as relevant to a given query. A growing body of work focuses on interpreting ranking models with different explainable AI methods. Instance attribution methods aim to explain individual predictions of machine learning models by identifying influential training data. However, despite their popularity, instance attribution methods are largely unexplored in the information retrieval context, particularly in text ranking. This thesis introduces an application of TracInCP, an instance attribution method, to infer the influence of query-passage training data on ranking model predictions. We propose and evaluate training data subset selection approaches based on influence. By analyzing patterns in influential examples, we find common query and passage characteristics in the training data that affect the model’s ranking decisions. Finally, we demonstrate possible challenges in using instance attribution to create smaller datasets for text ranking tasks.

Files

License info not available