Comparative Analysis of Recommendation Models on Scopus Data
Unveiling Patterns in Sparse Interactions for Academic Discovery
B.V. Yıldız (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Masoud Mansoury – Mentor (TU Delft - Multimedia Computing)
Amin Tabatabaei – Mentor (Elsevier)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
This thesis presents the design, implementation, and evaluation of a scalable, modular recommendation framework for academic article discovery on the Scopus platform. The research addresses limitations in Scopus’s existing “Related documents” module, which produces static, non-personalized suggestions based solely on metadata keyword overlap. To overcome these constraints, the proposed framework introduces a dual-mode retrieval strategy capable of generating both personalized recommendations, informed by historical user interactions, and non-personalized recommendations, based solely on the context of a target article.
The study begins with an extensive exploratory data analysis (EDA) of 2024 Scopus interaction logs, comprising over 31 million user-item events. A novel data transformation pipeline is developed to convert implicit feedback signals, such as downloads, views, and exports, into continuous-valued preference scores that are suitable for collaborative filtering models. This enables the application of state-of-the-art algorithms despite the absence of explicit ratings.
Four recommendation models are implemented and compared: Bayesian Personalized Ranking (BPR), Factored Item Similarity Model (FISM), Light Graph Convolutional Network (LightGCN), and Knowledge Graph Attention Network (KGAT). Model evaluation is performed using both traditional offline ranking metrics (Recall@10, Precision@10, NDCG@10, MRR@10, Hit Rate@10) and a novel Large Language Model (LLM) based evaluation framework leveraging GPT-4o for semantic assessment of relevance and serendipity.
Results show that LightGCN consistently outperforms other models in both personalized and non-personalized scenarios, achieving the highest accuracy and scalability. Non-personalized recommendations remain valuable in cold-start and anonymous browsing contexts. The integration of LLM based evaluation offers deeper qualitative insights into recommendation quality, capturing semantic alignment and novelty beyond what is reflected in traditional metrics. The proposed framework demonstrates that a unified embedding based architecture can effectively serve heterogeneous recommendation needs on large-scale scholarly platforms. The methodology and findings have broader implications for the design of academic recommender systems in data sparse and mixed user environments.