Print Email Facebook Twitter Social Web Data Analytics: Relevance, Redundancy, Diversity Title Social Web Data Analytics: Relevance, Redundancy, Diversity Author Tao, K. Contributor Houben, G.J.P.M. (promotor) Faculty Electrical Engineering, Mathematics and Computer Science Department Software and Computer Technology Date 2014-12-09 Abstract In the past decade, the Social Web has evolved into both an essential channel for people to exchange information and a new type of mass media. The immense amount of data produced presents new possibilities and challenges: algorithms and technologies need to be developed to extract and infer useful information from the Social Web. One of the main issues on the (Social) Web is the impurity of the data – not all content produced is meaningful or useful. (1) How can we predict the relevance of messages on the Social Web to users’information needs? (2) How can we reduce the redundancy among a list of messages retrieved in response to a user query? (3) How can we boost the diversity of such a ranked list in order to provide a more comprehensive coverage of the aspects pertinent to an information need? In this thesis, we answer these questions through Social Web data analytics on microblog data. The first part of the thesis introduces the Twitter Analytical Platform (TAP), which is an analytical platform for Twitter data. It aims at providing an easy-to-use platform for data scientists and software developers to efficiently conduct analytical tasks. The tasks can be customized with the Twitter Analysis Language (TAL), which is a language for designing data analytical workflows. In order to conduct the research presented in this thesis, a number of tools and components were implemented in TAP and are broadly applicable to typical Social Web analytics use cases. Taking search on Twitter as one of the main use cases for this research, the second part of the thesis presents our results for answering the aforementioned three questions. We first propose a query expansion framework that utilizes information from external knowledge bases. We integrate our research findings into a relevance estimation framework, which aims at analyzing the importance of different tweet-based features in predicting their relevance to an information need. Our second contribution is based on the insight that microblog search result rankings often contain a considerable amount of redundancy. We propose a near-duplicate detection framework designed to tackle this issue. Since a reduction in redundancy does not necessarily lead to increased diversity in the search result ranking, we also build a corpus specifically to investigate issues of novelty and diversity. Finally, we put the analytical results derived from investigating relevance, redundancy and diversity into practice and introduce Twinder, a search engine for Twitter streams. Twinder demonstrates the applicability of both our analytical platform TAP as well as our analytical findings. Inspired by real-life use cases, the last part of the thesis focuses on the development of Twitcident, an application aimed at fulfilling the information need from (semi-)public sectors during emergency or potentially dangerous circumstances. Based on TAP, we develop an interface of semantic-based faceted search and multiple widgets of visualized analytics for Twitcident. These components allow users to explore Twitter messages more efficiently. The application and the evaluation results show the validity of TAP as well as the effectiveness of exploiting semantics for filtering Twitter messages. Subject Social WebData AnalyticsInformation RetrievalTwitter To reference this document use: https://doi.org/10.4233/uuid:1af94380-1414-4497-bfc6-a67b213de050 ISBN 9789461863966 Part of collection Institutional Repository Document type doctoral thesis Rights (c) 2014 Tao, K. Files PDF phd-thesis-ktao-final.pdf 4.81 MB Close viewer /islandora/object/uuid:1af94380-1414-4497-bfc6-a67b213de050/datastream/OBJ/view