Load-Balanced Recovery Schemes for Single-Disk Failure in Storage Systems with Any Erasure Code

Author(s):  
Xianghong Luo ◽  
Jiwu Shu
2010 ◽  
Vol 38 (1) ◽  
pp. 119-130 ◽  
Author(s):  
Liping Xiang ◽  
Yinlong Xu ◽  
John C.S. Lui ◽  
Qian Chang

2014 ◽  
Vol 63 (4) ◽  
pp. 995-1007 ◽  
Author(s):  
Silei Xu ◽  
Runhui Li ◽  
Patrick P.C. Lee ◽  
Yunfeng Zhu ◽  
Liping Xiang ◽  
...  

2021 ◽  
Vol 17 (3) ◽  
pp. 1-24
Author(s):  
Duwon Hong ◽  
Keonsoo Ha ◽  
Minseok Ko ◽  
Myoungjun Chun ◽  
Yoona Kim ◽  
...  

A recent ultra-large SSD (e.g., a 32-TB SSD) provides many benefits in building cost-efficient enterprise storage systems. Owing to its large capacity, however, when such SSDs fail in a RAID storage system, a long rebuild overhead is inevitable for RAID reconstruction that requires a huge amount of data copies among SSDs. Motivated by modern SSD failure characteristics, we propose a new recovery scheme, called reparo , for a RAID storage system with ultra-large SSDs. Unlike existing RAID recovery schemes, reparo repairs a failed SSD at the NAND die granularity without replacing it with a new SSD, thus avoiding most of the inter-SSD data copies during a RAID recovery step. When a NAND die of an SSD fails, reparo exploits a multi-core processor of the SSD controller in identifying failed LBAs from the failed NAND die and recovering data from the failed LBAs. Furthermore, reparo ensures no negative post-recovery impact on the performance and lifetime of the repaired SSD. Experimental results using 32-TB enterprise SSDs show that reparo can recover from a NAND die failure about 57 times faster than the existing rebuild method while little degradation on the SSD performance and lifetime is observed after recovery.


Sign in / Sign up

Export Citation Format

Share Document