NIRVANA: A Non-intrusive Black-Box Monitoring Framework for Rack-Level Fault Detection

Author(s):  
Claudio Ciccotelli ◽  
Leonardo Aniello ◽  
Federico Lombardi ◽  
Luca Montanari ◽  
Leonardo Querzoni ◽  
...  
2010 ◽  
Vol 3 (1) ◽  
pp. 53-62 ◽  
Author(s):  
Dirk Jacob ◽  
Sebastian Dietz ◽  
Susanne Komhard ◽  
Christian Neumann ◽  
Sebastian Herkel

2019 ◽  
Vol 3 (1) ◽  
pp. 42-51
Author(s):  
M. Abdullah Eissa ◽  
R. R. Darwish ◽  
A. M. Bassiuny

Author(s):  
Mirco Altenbernd ◽  
Dominik Göddeke

We introduce a novel algorithm-based fault-tolerance scheme to detect and repair soft transient faults (silent data corruption, bitflips) in multigrid solvers: by applying the full approximation scheme (FAS) variant of multigrid to linear systems, we prove invariants that enable fault detection and correction, and ultimately lead to a black-box protection of the smoothing stage. A statistical analysis for a wide range of prototypical problems demonstrates the efficiency of our approach, especially compared with full checksum protection. In particular, the overhead of our new method is negligible in the fault-free case, since we only employ readily available quantities.


2005 ◽  
Vol 38 (1) ◽  
pp. 101-106
Author(s):  
K. Kumamaru ◽  
K. Inoue ◽  
F. Tsubouchi ◽  
T. Söderström

Sign in / Sign up

Export Citation Format

Share Document