Dependence Measures in Citation Analysis

The application of parametric copulas to capture the dependence structure between the publications of a reseacher and the citations of those publications.

Bachelor Thesis (2018)
Author(s)

A.D.S. Bachasingh (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

G. F. Nane – Mentor

D.C. Gijswijt – Graduation committee member

E.M. van Elderen – Graduation committee member

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2018 Ashni Bachasingh
More Info
expand_more
Publication Year
2018
Language
English
Copyright
© 2018 Ashni Bachasingh
Graduation Date
20-12-2018
Awarding Institution
Delft University of Technology
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In this thesis we try to capture the dependence structure of the publications of a scholar and the citations of those publications via copulas. To do so, we will use a sample of Quebec re- searchers for who their publication amount as well as their citation amounts are known. We are provided with multiple variables concerning citation. We study the dependence structure be- tween these variables, with the aim of fitting copulas to this structure, by calculating correlation scores and visualising the structure. Copulas are functions that ”join together” one-dimensional distribution functions with a dependence structure, in order to represent joint distributions. The correlation scores are calculated across various ranges of the variables to provide us with a deeper understanding of the dependence structure between the variables.
Using Sklar’s theorem and some helpful functions in various packages in the software program R, parametric copulas fit the dependence structures of the various pairs of variables. Based on a Goodness-of-fit test, certain parametric copula models are rejected at a 5% significance level. Unsurprisingly, there are also dependence structures that can be well captured with a parametric copula.
Parametric copula families are not only used for fitting the data, but also for prediction. Since a good fitting model does not necessarily imply a good predictive model, we have also performed a validation analysis. The parametric copula models that are not rejected by the test at a 5% significance level are validated via k-fold cross validation. Part of the data have been used to fit the model and the remaining has been validated using a k-fold cross validation. It turns out that the best fitting copula model does not always perform well in term of prediction. That is, these copulas do not always perform best during the cross-validation.

Files

Verslag_BEP.pdf
(pdf | 10.4 Mb)
License info not available