Use of LLMs to Improve Affiliation Disambiguation in Alexandria3k

Bachelor Thesis (2024)
Author(s)

D.T. Gupta (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

D. Spinellis – Mentor (TU Delft - Software Engineering)

G. Gousios – Mentor (TU Delft - Software Technology)

K.G. Langendoen – Graduation committee member (TU Delft - Embedded Systems)

More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
01-02-2024
Awarding Institution
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Downloads counter
139
Collections
thesis
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The growth of academic publications, heterogeneity of datasets and the absence of a globally accepted organization identifier introduce the challenge of affiliation disambiguation in bibliographic databases. In this paper, we create a baseline using the currently implemented algorithm for author affiliation linkage in Alexandria3k by comparing it to the ground truth. We aim to explore the usage of LLMs (GPT-4) in the Alexandria3k environment to disambiguate author affiliations. The proposed approach extracts the research organization from textual affiliations provided by researchers through their published works and cross-references the organization across the Research Organization Registry. Our process shows promising results and a significant improvement on the existing algorithm in terms of matching rate and identification of multiple affiliations. We discuss the margin of error in LLM results, limitations of the ground truth, and suggest future research directions.

Files

License info not available