Instance Attribution in Information Retrieval
Identifying and Selecting Influential Instances with Instance Attribution for Passage Re-Ranking
I.S. Hacipoglu (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Avishek Anand – Mentor (TU Delft - Web Information Systems)
Max Idahl – Coach (Leibniz University of Hannover)
SE Verwer – Graduation committee member (TU Delft - Cyber Security)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
The complexity of deep neural rankers and large datasets make it increasingly more challenging to understand why a document is predicted as relevant to a given query. A growing body of work focuses on interpreting ranking models with different explainable AI methods. Instance attribution methods aim to explain individual predictions of machine learning models by identifying influential training data. However, despite their popularity, instance attribution methods are largely unexplored in the information retrieval context, particularly in text ranking. This thesis introduces an application of TracInCP, an instance attribution method, to infer the influence of query-passage training data on ranking model predictions. We propose and evaluate training data subset selection approaches based on influence. By analyzing patterns in influential examples, we find common query and passage characteristics in the training data that affect the model’s ranking decisions. Finally, we demonstrate possible challenges in using instance attribution to create smaller datasets for text ranking tasks.