Distributed storage systems provide data availability by means of redundancy. To assure a fixed level of availability in case of node failures, new redundant fragments need to be introduced. Since node failures can be either transient or permanent, deciding when to generate new fragments is non-trivial. An additional difficulty is due to the fact that the failure behavior in terms of the rate of permanent and transient failures may vary over time. To be able to adapt to changes in the failure behavior, many systems adopt a reactive approach, in which new fragments are created as soon as a failure is detected. However, reactive approaches tend to produce spikes in bandwidth consumption. Proactive approaches create new fragments at a fixed rate that depends on the knowledge of the failure behavior or is given as a parameter by the system manager. However, existing proactive systems are not able to adapt to a changing failure behavior, which is common in real world. We propose a new technique based on an ongoing estimation of the failure behavior that is modeled by a network of queues. This scheme combines the adaptiveness of reactive systems with the smooth bandwidth usage of proactive systems. It can be considered as a generalization of the two previous approaches, in which the duality reactive or proactive becomes a specific case of a wider approach tunable with respect to the dynamics of the failure behavior.
Proactive replication in distributed storage systems using machine availability estimation
CoNEXT 2007, 3rd International Conference on emerging Networking EXperiments and Technologies, December 10-13, 2007, New York, USA
© ACM, 2007. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in CoNEXT 2007, 3rd International Conference on emerging Networking EXperiments and Technologies, December 10-13, 2007, New York, USA
PERMALINK : https://www.eurecom.fr/publication/2355