Instance Attribution in Information Retrieval

Identifying and Selecting Influential Instances with Instance Attribution for Passage Re-Ranking

More Info
expand_more

Abstract

The complexity of deep neural rankers and large datasets make it increasingly more challenging to understand why a document is predicted as relevant to a given query. A growing body of work focuses on interpreting ranking models with different explainable AI methods. Instance attribution methods aim to explain individual predictions of machine learning models by identifying influential training data. However, despite their popularity, instance attribution methods are largely unexplored in the information retrieval context, particularly in text ranking. This thesis introduces an application of TracInCP, an instance attribution method, to infer the influence of query-passage training data on ranking model predictions. We propose and evaluate training data subset selection approaches based on influence. By analyzing patterns in influential examples, we find common query and passage characteristics in the training data that affect the model’s ranking decisions. Finally, we demonstrate possible challenges in using instance attribution to create smaller datasets for text ranking tasks.