Proximity of Terms, Texts and Semantic Vectors in Information Retrieval

Doctoral thesis (2017)

Authors

J.B.P. Vuurens

DOI: https://doi.org/10.4233/uuid:2dcad546-6cbd-45ca-abe7-ffcf613b1376

Clustering Retrieval algorithms Recommender systems Information retrieval

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:2dcad546-6cbd-45ca-abe7-ffcf613b1376

Published Date

2017

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Information Retrieval (IR) is finding content of an unstructured nature with respect to an information need. A retrieval system typically uses a retrieval model to rank the available content by their estimated relevance to an information need. For decades, state-of-the-art retrieval models have used the assumption that terms appear independently in text documents. Chapter 1 of this thesis describes how the relevance likelihood of a document changes by the observed distance between co-occurring query terms in its text.
Nowadays, news is abundantly available online, allowing users to discover and follow news events. However, online news is often very redundant; most sources basing their stories on previously published works and add only limited new information. Thus, a user often ends up spending significant amount of effort re-reading the same parts of a story before finding relevant and novel information. In Chapter 2 and Chapter 3, we present a novel approach to construct an online news summary for a given topic. Salient sentences are identified by clustering the sentences in the news stream based on the relative proximity of the sentences and the temporal proximity of their publication times. To improve the coherence of a long summary that describes a news topic, we propose to automatically cluster sentences by subtopics in Chapter 4. In Chapter 5, we show how new topics can be detected in the news stream using the same clustering technique.
In real-life decision making, people are often faced with an overload of choices. A recommender system aids the user by reducing the available choices to a shortlist of items that are of interest to the user. In Chapter 6, we learn high-dimensional representations for movies that allow to effectively recommend movies based on a user’s most recently rated movies.

Files

Dissertation_JVuurens_web.pdf

(.pdf | 1.37 Mb)