Vertical Selection for Heterogeneous Search Engine Result Pages

Bachelor Thesis (2021)
Author(s)

A. Vilčinskas (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

C. Hauff – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

G. Iosifidis – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2021
Language
English
Graduation Date
01-07-2021
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
183
Collections
thesis
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Items that a user can see when he uses the general result page of a modern search engine can be categorized as verticals. Some examples of verticals are images, videos, news, shopping. Heterogeneous search engine result pages encompass result pages that contain results from different verticals. It is widely used and has been proven to improve the user experience over the result pages that only contain a list of websites. Different verticals are appropriate for each query. We study how to define, develop, and evaluate a vertical selection model, that for a query selects and presents the appropriate verticals. We give an approach for collecting a corpus of documents that represent different verticals. Later corpus documents are used as training data for query result classification. Features were extracted from the documents to train a classifier. The model that uses the Random Forest classifier and features extracted from the query itself achieved an f-score of 0.4921 on the TREC 2014 dataset. The score and the analysis of the results show that the proposed vertical selection methodology is viable. To better capture the difference between documents in different verticals, the corpus collection approach should be improved.

Files

Research_Paper.pdf
(pdf | 1.28 Mb)
License info not available