scholarly journals A Flexible Fault-Tolerance Mechanism for the Integrade Grid Middleware

Author(s):  
Stanley Araujo de Sousa ◽  
Francisco Jose da Silva e Silva ◽  
Rafael Fernandes Lopes
Electronics ◽  
2020 ◽  
Vol 9 (12) ◽  
pp. 2074
Author(s):  
J.-Carlos Baraza-Calvo ◽  
Joaquín Gracia-Morán ◽  
Luis-J. Saiz-Adalid ◽  
Daniel Gil-Tomás ◽  
Pedro-J. Gil-Vicente

Due to transistor shrinking, intermittent faults are a major concern in current digital systems. This work presents an adaptive fault tolerance mechanism based on error correction codes (ECC), able to modify its behavior when the error conditions change without increasing the redundancy. As a case example, we have designed a mechanism that can detect intermittent faults and swap from an initial generic ECC to a specific ECC capable of tolerating one intermittent fault. We have inserted the mechanism in the memory system of a 32-bit RISC processor and validated it by using VHDL simulation-based fault injection. We have used two (39, 32) codes: a single error correction–double error detection (SEC–DED) and a code developed by our research group, called EPB3932, capable of correcting single errors and double and triple adjacent errors that include a bit previously tagged as error-prone. The results of injecting transient, intermittent, and combinations of intermittent and transient faults show that the proposed mechanism works properly. As an example, the percentage of failures and latent errors is 0% when injecting a triple adjacent fault after an intermittent stuck-at fault. We have synthesized the adaptive fault tolerance mechanism proposed in two types of FPGAs: non-reconfigurable and partially reconfigurable. In both cases, the overhead introduced is affordable in terms of hardware, time and power consumption.


Author(s):  
Gianni Pucciani ◽  
Flavia Donno ◽  
Andrea Domenici ◽  
Heinz Stockinger

Data replication is a well-known technique used in distributed systems in order to improve fault tolerance and make data access faster. Several copies of a dataset are created and placed at different nodes, so that users can access the replica closest to them, and at the same time the data access load is distributed among the replicas. In today’s Grid middleware solutions, data management services allow users to replicate datasets (i.e., flat files or databases) among storage elements within a Grid, but replicas are often considered read-only because of the absence of mechanisms able to propagate updates and enforce replica consistency. This entry analyzes the replica consistency problem and provides hints for the development of a Replica Consistency Service, highlighting the main issues and pros and cons of several approaches.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 43277-43288
Author(s):  
Yinghua Tong ◽  
Liqin Tian ◽  
Lianhai Lin ◽  
Zhigang Wang

Sign in / Sign up

Export Citation Format

Share Document