Vertical Selection for Heterogeneous Search Engine Result Pages

Bachelor Thesis (2021)
Author(s)

A. Vilčinskas (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

C Hauff – Mentor (TU Delft - Web Information Systems)

George Iosifidis – Graduation committee member (TU Delft - Embedded Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2021 Augustas Vilčinskas
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Augustas Vilčinskas
Graduation Date
01-07-2021
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Items that a user can see when he uses the general result page of a modern search engine can be categorized as verticals. Some examples of verticals are images, videos, news, shopping. Heterogeneous search engine result pages encompass result pages that contain results from different verticals. It is widely used and has been proven to improve the user experience over the result pages that only contain a list of websites. Different verticals are appropriate for each query. We study how to define, develop, and evaluate a vertical selection model, that for a query selects and presents the appropriate verticals. We give an approach for collecting a corpus of documents that represent different verticals. Later corpus documents are used as training data for query result classification. Features were extracted from the documents to train a classifier. The model that uses the Random Forest classifier and features extracted from the query itself achieved an f-score of 0.4921 on the TREC 2014 dataset. The score and the analysis of the results show that the proposed vertical selection methodology is viable. To better capture the difference between documents in different verticals, the corpus collection approach should be improved.

Files

Research_Paper.pdf
(pdf | 1.28 Mb)
License info not available