Splash

Data synchronization in unmanaged, untrusted peer-to-peer networks

More Info
expand_more

Abstract

Peer-to-peer networks rely on gossip algorithms to spread information about the peer activity and the network status. State-of-the-art gossip algorithms are not sufficient to spread the information widely, as the size and the complexity of the unmanaged networks grow. They suffer from high bandwidth utilization and lack mechanisms to verify the validity of the transferred information. We set the following questions for research: How can we achieve high coverage ratio for all the peers in the network while using as little bandwidth as possible? How can such a mechanism scale to millions of nodes? How can we reduce the effects of high churn rates? And finally, how we secure that the data transmitted are genuine? We develop Splash, a data synchronization algorithm with emphasis on high data coverage and efficient bandwidth usage. Our solution is based on Bloom filters and allows both partial and full synchronizations while avoiding the transmission of duplicate information. We propose two schemes to synchronize current and historical data between peers, over unmanaged and untrusted peer-to-peer networks. We use a SyncLog to keep the state of the synchronization algorithm to a minimum. For scalability we categorize data based on their origin, as local or global, and we investigate the use of different synchronization policies. Finally we discuss mechanisms that validate data to prevent the spread to invalid information. Through a series of experiments we present the effectiveness of this solution in networks with different sizes and churn effects. In simulation Splash achieves 95% data coverage for all the peers in the network, which is 5 times more compared to BarterCast. Additionally Splash uses 107 times less bandwidth compared to BarterCast to achieve the same data coverage.g

Files