Print Email Facebook Twitter Exploiting Embedding in Content-Based Recommender systems Title Exploiting Embedding in Content-Based Recommender systems Author Huang, Y. Contributor Larson, M.A. (mentor) Faculty Electrical Engineering, Mathematics and Computer Science Department Intelligent Systems Date 2016-11-24 Abstract XING is a leading career-oriented social networking site in Europe, which usually recommend job ads to their customers. One of the widely used methods in Recomender Systems is content-based filtering, which analyzes the description of item characteristics and the user profile illustrating user's preferences. Due to the sparsity of its dataset, i.e. many job postings are rarely interacted with, XING has been using content-based recommender system to promote the quality of the recommendations. Recent word embedding technique learns semantically meaningful representations for words from co-occurrence in sentences, which enables the effective comparison between words. Based on the Word2Vec technique, XING represents job postings by the average embedding over words they contain. This study explores three alternative methods to represent job postings for the task of recommending jobs to users. In the first experiment, we explore whether the use of a subset of words is more effective to represent the job postings. In the second experiment, instead of averaging over word embeddings, we directly learn document embeddings using Paragraph2Vec. And finally, the third experiment uses Word Mover's Distance to estimate the similarity between job postings. Our experiments show that the embeddings that are learned with Paragraph2Vec result in a better estimation of which job postings are similar, but only when high-dimensional settings are used. The Word Mover's Distance algorithm is computationally expensive, therefore we use existing lower-bounds that allowed us to complete a small-scale experiment within the available time. The results indicate that Word Mover's Distance is not as effective as the average over word embeddings and Paragraph2Vec. In the final part of this thesis, we present the Link2Vec, a novel item representation method based on Word2Vec, which learns semantic representations for items based on the context surrounding the hyperlinks that refer to the item, e.g. hyperlinks to the item's Wikipedia page. Our experiments show that the effectiveness of the embeddings learned with Link2Vec improves with the amount of training data. For the evaluation on the MovieLens dataset, we only obtained a limited set of hyperlinks, which resulted in results that approximate a baseline that uses the average over word embeddings. To reference this document use: http://resolver.tudelft.nl/uuid:cbec7bdd-4bab-4132-93cd-359587b9bf46 Part of collection Student theses Document type master thesis Rights (c) 2016 Huang, Y. Files PDF Master_Thesis_Yanbo_Huang.pdf 2.78 MB Close viewer /islandora/object/uuid:cbec7bdd-4bab-4132-93cd-359587b9bf46/datastream/OBJ/view