Skip to main content

Hitachi

Corporate InformationResearch & Development

April 7, 2015

Report from Presenter

The 23rd Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP 2015) was held in Finland, from March 4, 2015 to March 6. The scope of the conference covers a range of topics in parallel and distributed processing, and there was presentation in wide area such as big data, cloud and high performance computing. We presented a full paper titled "Reliability Analysis of Highly Redundant Distributed Storage Systems with Dynamic Refuging", which analyze reliability of storage systems which is the basic component in the big data systems.


Fig. 1: Reliability Model of Storage Systems
with Dynamic Refuging

Enlarge

The amount of digital data is exponentially increasing; the storage system capacity is also increasing. It is necessary to maintain the reliability of data even if amount of data is increasing. Because it may affect to the application (e.g. big data analysis) even if a part of data is lost. There is concern regarding reliability in large-scale storage systems, as the number of drives increases, systems are more subject to multiple drive failures. Some large-scale storage systems protect data by "Erasure Coding" to prevent data loss. Erasure Coding can add redundancy levels to the storage system. As the redundancy level of Erasure Coding is increased, the reliability will increase, but the increase in normal data write operation and additional storage for coding will be incurred. We therefore need to achieve high reliability at the lowest possible redundancy level.

In this presentation, we modeled and analyzed the reliability of large-scale storage systems with Dynamic Refuging (Fig. 1). Dynamic Refuging is an efficient rebuild architecture and algorithm for large-scale storage systems. Dynamic Refuging distributes stripe blocks among many drives efficiently and strategically selects stripe blocks to rebuild which depends on the redundancy level of each storage area dynamically changes due to multiple drive failure. In the case of redundancy level-3 storage systems with Dynamic Refuging, we found that it maintains or even lowers data loss probability when the scale up. This was confirmed both by simulation and closed formulas.

(By AKUTSU Hiroaki)

  • Page top