Author Name Disambiguation using Large Language Models

Contributions to a system for open reproducible publication research

Bachelor thesis (2024)

Authors

J. van Lieshout Electrical Engineering, Mathematics and Computer Science

Contributors

D. Spinellis Software Engineering - (supervisor 1)

G. Gousios Software Technology (supervisor 1)

K.G. Langendoen Embedded Systems - (supervisor 2)

Faculty

Electrical Engineering, Mathematics and Computer Science

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:c7e98b04-b127-4c02-a6c1-e250ae5b0566

Published Date

29-01-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Author name disambiguation, otherwise described as (publication) record linking, is a problem that has had considerable research dedicated to its solv- ing. Author attributions, calculating research met- rics and conducting literature reviews are amongst processes that experience increased difficulty due to ambiguous author names. In this study, a novel approach is presented to disambiguate au- thors related to scientific publications, using Large Language Models (LLMs) in combination with the Alexandria3k software package. LLMs have shown great potential in processing, analysing and drawing conclusions when presented with human- readable data. The approach presented in this study supplies a LLM with known attributes of publica- tion records and authors, such as names, affiliations and co-authors, to determine whether records writ- ten by authors with ambiguous names can be linked to the same real-world person. Using Alexan- dria3k, a dataset of authors and publications with confirmed identities is created to test and validate the approach. Finally, the approach is measured against state-of-the-art methods to disambiguate author names and different configurations are pre- sented and discussed.

Files

CSE3000_Jelle_van_Lieshout_Fin... (.pdf)

(.pdf | 0.817 Mb)