Author Name Disambiguation using Large Language Models

Contributions to a system for open reproducible publication research

More Info
expand_more

Abstract

Author name disambiguation, otherwise described as (publication) record linking, is a problem that has had considerable research dedicated to its solv- ing. Author attributions, calculating research met- rics and conducting literature reviews are amongst processes that experience increased difficulty due to ambiguous author names. In this study, a novel approach is presented to disambiguate au- thors related to scientific publications, using Large Language Models (LLMs) in combination with the Alexandria3k software package. LLMs have shown great potential in processing, analysing and drawing conclusions when presented with human- readable data. The approach presented in this study supplies a LLM with known attributes of publica- tion records and authors, such as names, affiliations and co-authors, to determine whether records writ- ten by authors with ambiguous names can be linked to the same real-world person. Using Alexan- dria3k, a dataset of authors and publications with confirmed identities is created to test and validate the approach. Finally, the approach is measured against state-of-the-art methods to disambiguate author names and different configurations are pre- sented and discussed.