Efficient Crawling of Community Structures in Online Social Networks

Master thesis (2011)

Authors

S. Van Kester

Contributors

C. Doerr (mentor)

Programme

Telecommunications () (TU Delft)

Online social networks Crawling

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:18c3e965-8417-47c3-80f7-c658cc50c88c

Published Date

27-09-2011

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Programme

Telecommunications

Abstract

Online social networks showed an enormous growth in the last decade. With the rise of online social networks such as Twitter and Facebook, researchers got the opportunity to access the data of social behavior of millions of people, whereas in the past it was limited to hundreds of people. For these researchers and marketeers it is of great interest to find communities within these large networks, as this is one of the opportunities to see how people behave in groups on a large scale. The most common approach of analyzing community structures in online social networks is to gather the network by downloading the user profiles one by one (crawling) and afterwards partition the network into groups or communities by community detection algorithms. However, crawling an entire social network is very time consuming and analyzing the networks with community detection algorithms can be computationally expensive. To overcome these problems, in this thesis a method is proposed for crawling nodes using the community structure of a network. It enables the researcher to start the analysis before completing the crawl. This new method performs between 66% and 480% better than existing crawling techniques such as Breadth First Search (BFS) and Depth First Search (DFS), because a smaller portion of the networks has to be crawled in order to crawl entire communities. The computer-generated networks used in this thesis were created using a new network generator which uniquely combines three features; it creates networks with explicit community structure, arbitrary degree distributions and adaptable community strength.

Files

MscThesis.pdf

(pdf | 3.99 Mb)