Predicting the age of researchers using bibliometric data

Journal Article (2017)
Author(s)

G.F. Nane (TU Delft - Applied Probability)

Vincent Larivière (University of Quebec)

Rodrigo Costas (Universiteit Leiden)

Research Group
Applied Probability
Copyright
© 2017 G.F. Nane, Vincent Larivière, Rodrigo Costas
DOI related publication
https://doi.org/10.1016/j.joi.2017.05.002
More Info
expand_more
Publication Year
2017
Language
English
Copyright
© 2017 G.F. Nane, Vincent Larivière, Rodrigo Costas
Research Group
Applied Probability
Issue number
3
Volume number
11
Pages (from-to)
713-729
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The age of researchers is a critical factor necessary to study the bibliometric characteristics of the scholars that produce new knowledge. In bibliometric studies, the age of scientific authors is generally missing; however, the year of the first publication is frequently considered as a proxy of the age of researchers. In this article, we investigate what are the most important bibibliometric factors that can be used to predict the age of researchers (birth and PhD age). Using a dataset of 3574 researchers from Québec for whom their Web of Science publications, year of birth and year of their PhD are known, our analysis falls under the linear regression setting and focuses on investigating the predictive power of various regression models rather than data fitting, considering also a breakdown by fields. The year of first publication proves to be the best linear predictor for the age of researchers. When using simple linear regression models, predicting birth and PhD years result in an error of about 3.7 years and 3.9 years, respectively. Including other bibliometric data marginally improves the predictive power of the regression models. A validation analysis for the field breakdown shows that the average length of the prediction intervals vary from 2.5 years for Basic Medical Sciences (for birth years) up to almost 10 years for Education (for PhD years). The average models perform significantly better than the models using individual observations. Nonetheless, the high variability of data and the uncertainty inherited by the models advice to caution when using linear regression models for predicting the age of researchers.

Files

26197668_AGE_of_researchers_na... (pdf)
(pdf | 1.45 Mb)
- Embargo expired in 15-06-2019