Instance Attribution in Information Retrieval

None, None

Instance Attribution in Information Retrieval

Identifying and Selecting Influential Instances with Instance Attribution for Passage Re-Ranking

Master Thesis (2023)

Author(s)

I.S. Hacipoglu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Avishek Anand – Mentor (TU Delft - Web Information Systems)

Max Idahl – Coach (Leibniz University of Hannover)

SE Verwer – Graduation committee member (TU Delft - Cyber Security)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Explainable AI Information Retrieval Instance Attribution Passage ranking Document ranking

To reference this document use:

https://resolver.tudelft.nl/uuid:9ce89b5c-4b97-4ae0-bc57-cb9ef29b88bb

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

24-10-2023

Awarding Institution

Delft University of Technology

Programme

Computer Science

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The complexity of deep neural rankers and large datasets make it increasingly more challenging to understand why a document is predicted as relevant to a given query. A growing body of work focuses on interpreting ranking models with different explainable AI methods. Instance attribution methods aim to explain individual predictions of machine learning models by identifying influential training data. However, despite their popularity, instance attribution methods are largely unexplored in the information retrieval context, particularly in text ranking. This thesis introduces an application of TracInCP, an instance attribution method, to infer the influence of query-passage training data on ranking model predictions. We propose and evaluate training data subset selection approaches based on influence. By analyzing patterns in influential examples, we find common query and passage characteristics in the training data that affect the model’s ranking decisions. Finally, we demonstrate possible challenges in using instance attribution to create smaller datasets for text ranking tasks.

Files

Shacipoglu_masters_thesis.pdf

(pdf | 3.26 Mb)

License info not available