Topic Classification of Publications

Identifying publication topics based on existing journals

Bachelor Thesis (2024)
Author(s)

D. Lim (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

DIomidis Spinellis – Mentor (TU Delft - Software Engineering)

G. Gousios – Mentor (TU Delft - Software Technology)

KG Langendoen – Graduation committee member (TU Delft - Embedded Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2024 Dayoung Lim
More Info
expand_more
Publication Year
2024
Language
English
Copyright
© 2024 Dayoung Lim
Graduation Date
01-02-2024
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Accurate topic classification is crucial in the scientific community when it comes to finding relevant journals. However, the efficiency and accuracy of topic classification of publications do not seem to be at its best performance, especially with the fast-paced rise in the quantity of research papers. Our research aims to address this problem by utilizing state-of-the-art (SOTA) methods. We chose the 'April 2022 Crossref' data set for the research, as Alexandria3k, the tool utilized for querying on the open data set, is tested on the same data. We stratified 50,000 data that have title, abstract, and work names, which are the roughly assigned topics. SOTA methods chosen for feature extraction and classification models are OpenAI Embeddings and XGBoost. Our research shows that this combination of SOTA methods has the potential to improve the performance of current topic classification of publications.

Files

Dayoung_Lim_Final_Paper.pdf
(pdf | 0.423 Mb)
License info not available