Benchmarking Distributed Database Performance and Dependability under Partial System Failures

More Info
expand_more

Abstract

Many types of database management systems exist, but finding the one that is right for a specific use case is becoming increasingly more difficult. Benchmarks allow one to compare various systems, but in a world where distributed DBMSs are increasingly used for mission critical purposes, we find most existing benchmarks neglect fault tolerance and dependability aspects. In this Master’s Thesis, we design a modular and highly extensible framework capable of introducing partial system failures in a distributed database deployment. We also implement a proof-of-concept version of our framework which we use to evaluate the performance of a CockroachDB cluster deployed through Kubernetes, by running the TPC-C benchmark while we inject faults and measure changes in performance. Using this proof-of-concept implementation we demonstrate the faults our system can introduce and find that the impact of our high-level node failures is strongly dependent on the time a node has to perform a graceful shutdown and notify its peers or connected clients.