Conflict Free R Tree

A CRDT-based index for P2P systems

More Info
expand_more

Abstract

CRDTs are data structures that allow conflict free replication and modification. In theory, CRDTs seem like a natural fit for open P2P networks, however there are obstacles to overcome. Firstly, many proposed CRDTs are grow-only because (i) CRDTs may track deletions in permanent tombstone values or (ii) they may gather permanent information on every peer in the system. As such, CRDTs are not well adapted to open P2P networks. Many peers may come and go over time and persistent accurate information about all peers will grow too large eventually. Secondly, some types of CRDT (mainly op-based CmRDT) require causal message delivery which is hard in open P2P networks. Thirdly, CRDTs are typically built with the assumption that all peers need all data in a replica and thus all data is fully replicated, even though a client may only be interested in a small subset.

To address these issues a new state-based CvRDT is proposed: BloomCRDT, which is a variation on the OR-set that replaces its 푅 grow-only set with Bloom filters. It does not need knowledge of other peers or their state and avoids tombstones. This makes BloomCRDT well suited for use in open P2P networks. The grow-only aspect is vastly reduced compared to the standard OR-set. However, the BloomCRDT itself is indivisible and can grow to be too large if it stores many elements. This limit is not high enough to accommodate demanding applications. Scalability can be dramatically improved by combining multiple BloomCRDTs to form a Conflict Free R-Tree (CFRT). Each node of the R-Tree is represented by a BloomCRDT. Concurrent modifications are allowed and, due to the characteristics of the R-tree, this results in a consistent data structure with overlapping ranges. Since an R-Tree’s efficiency is reduced when ranges overlap, a periodic optimization algorithm can be used to eliminate the overlapping ranges.

A proof-of-concept implementation was made of BloomCRDT and CFRT using Python and PyIpv8. Experiments where performed using the Gumby experimental framework and the DAS5 cluster. The experiments show that BloomCRDT performs as designed: it keeps a smaller state than other solutions while being invariant to the number of identities that have interacted with a BloomCRDT instance. The CFRT is shown to be able to balance key/value entries over multiple BloomCRDT instances using favorable messaging metrics. Moreover the CFRT, and by extension BloomCRDT, tolerate network faults and can work with up to 90% message loss. A final experiment uses real world data to compare the CFRT to Tribler’s channels implementation, and shows that even at scale a CFRT can outperform the current channels implementation by orders of magnitude in terms of user responsiveness.