The cost of Byzantine Fault Tolerant (BFT) storage is the main concern preventing its adoption in practice. This cost stems from the need to maintain at least 3t+1 replicas in different storage servers in the asynchronous model, so that t Byzantine replica faults can be tolerated. In this paper, we present MDStore, the first fully asynchronous read/write BFT storage protocol that reduces the number of data replicas to as few as 2t+1, maintaining 3t+1 replicas of metadata at (possibly) different servers. At the heart of MDStore store is its metadata service that is built upon a new abstraction we call timestamped storage. Timestamped storage both allows for conditional writes (facilitating the implementation of a metadata service) and has consensus number one (making it implementable wait-free in an asynchronous system despite faults). In addition to its low data replication factor, MDStore offers very strong guarantees implementing multi-writer multi-reader atomic wait-free semantics and tolerating any number of Byzantine readers and crash-faulty writers. We further show that MDStore data replication overhead is optimal; namely, we prove a lower bound of 2t+1 on the number of data replicas that applies even to crash-tolerant storage with a fault-free metadata service oracle. Finally, we prove that separating data from metadata for reducing the cost of BFT storage is not possible without cryptographic assumptions. However, our MDStore protocol uses only lightweight cryptographic hash functions.
Asynchronous BFT storage with 2t + 1 data replicas
Technical Report / Also on Arxiv
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Technical Report / Also on Arxiv and is available at :
PERMALINK : https://www.eurecom.fr/publication/4147