Optimising a Recommendation Model for Career Discovery

Bachelor thesis (2017)

Authors

B. Beker

R. Brugsma

J.F. Offerijns

Contributors

H. Wang (mentor)

T.E.P.M.F. Abeel (mentor)

Programme

Computer Science () (TU Delft)

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:20d3b81e-c711-4f67-ba9f-4fc4399b5f35

Published Date

31-01-2017

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Programme

Computer Science

Abstract

Recommendation systems are algorithms that aim to predict what items are preferred by a user, based on a recorded history of user activity. Magnet.me is a company which recommends companies and opportunities to students. Potential algorithms for recommendation systems are memory-based and model-based coll aborative filtering, graph- based approaches, support vector machines, rand om forest classifiers and w ide & deep learning. Based on a qualitative comparison of the algorithms, model-based collaborative filtering, which is what Magnet.me currently uses as well, was chosen to be the best fit. This is because it scored highly on the three most important factors for Magnet.me: potential performance, compatibility with the dataset and scalability. When comparing several well-known benchmarking metrics, the most suitable metric for testing the performance of the recomme nder was the F1 measure. To benchmark the model, the connection status of user-company relations should be used as the label, but must be excluded in the calculation of the implicit ratings. Logarithmically scaling the view counts before used as a factor in the implicit ratings has proven to be of negligible effect. Five other signals are found that could be used to improve the recommendations. Hyperparameter optimization with cross validation is implemented and has succesfully been deployed into the Magnet.me technology stack. The possibilities for a clustering algorithm are considered in order to solve the cold start problem, but we could not determine the numeric distances between features, which is required for training an accurate clustering algorithm. An unexpected challenge of this project was setting up the development infrastructure. It consisted on setting up an Elasticsearch cluster and a Spark cluster that can interact with each other on Google Cloud services. Another challenge was in benchmarking the recommender with proper metrics.

Files

FinalReport.pdf

(pdf | 3.29 Mb)