Splash

Data synchronization in unmanaged, untrusted peer-to-peer networks

Master thesis (2010)

Authors

G. Logiotatidis

Contributors

J. Pouwelse (mentor)

Programme

Computer Engineering () (TU Delft)

Peer-to-peer Tribler Data synchronization Gossip

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:52b586ea-6144-4b4e-a5a1-b05255ce493a

Published Date

18-08-2010

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Programme

Computer Engineering

Abstract

Peer-to-peer networks rely on gossip algorithms to spread information about the peer activity and the network status. State-of-the-art gossip algorithms are not sufficient to spread the information widely, as the size and the complexity of the unmanaged networks grow. They suffer from high bandwidth utilization and lack mechanisms to verify the validity of the transferred information. We set the following questions for research: How can we achieve high coverage ratio for all the peers in the network while using as little bandwidth as possible? How can such a mechanism scale to millions of nodes? How can we reduce the effects of high churn rates? And finally, how we secure that the data transmitted are genuine? We develop Splash, a data synchronization algorithm with emphasis on high data coverage and efficient bandwidth usage. Our solution is based on Bloom filters and allows both partial and full synchronizations while avoiding the transmission of duplicate information. We propose two schemes to synchronize current and historical data between peers, over unmanaged and untrusted peer-to-peer networks. We use a SyncLog to keep the state of the synchronization algorithm to a minimum. For scalability we categorize data based on their origin, as local or global, and we investigate the use of different synchronization policies. Finally we discuss mechanisms that validate data to prevent the spread to invalid information. Through a series of experiments we present the effectiveness of this solution in networks with different sizes and churn effects. In simulation Splash achieves 95% data coverage for all the peers in the network, which is 5 times more compared to BarterCast. Additionally Splash uses 107 times less bandwidth compared to BarterCast to achieve the same data coverage.g

Files

Thesis_final.pdf

(pdf | 0.533 Mb)