A Comparison of Instance Attribution Methods

Comparing Instance Attribution Methods to Baseline k-Nearest Neighbors Method

Bachelor Thesis (2023)
Author(s)

E.J. de Kruif (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Avishek Anand – Mentor (Leibniz Universität)

L. Corti – Graduation committee member (TU Delft - Web Information Systems)

Lijun Lyu – Graduation committee member (L3S)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Evan de Kruif
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Evan de Kruif
Graduation Date
03-02-2023
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In this research, a comparison between different Instance Attribution (IA) methods and k-Nearest Neighbors (kNN) via cosine similarity is conducted on a Natural Language Processing (NLP) machine learning model. The format in which the comparison is made is by way of a human survey and automated similarity comparisons of representative vectors. The goal of this is to judge and compare the effectiveness of each method’s results in the context of a human’s language understanding and ability to determine if a fact is true or not. Through this research, it was found that for results obtained on the same input, IA methods were preferred 32.5% more often than kNN. It is also shown that this preference is not linked to the similarity between the IA results and the kNN results. Through these findings, it can be seen that when understood through the lens of human comprehension, IA methods are much more effective at generating a set of influential training points from the model’s training dataset.

Files

License info not available