REPAIR: Control Flow Protection based on Register Pairing Updates for SW-Implemented HW Fault Tolerance

2021 ◽  
Vol 20 (5s) ◽  
pp. 1-22
Author(s):  
Uzair Sharif ◽  
Daniel Mueller-Gritschneder ◽  
Ulf Schlichtmann

Safety-critical embedded systems may either use specialized hardware or rely on Software-Implemented Hardware Fault Tolerance (SIHFT) to meet soft error resilience requirements. SIHFT has the advantage that it can be used with low-cost, off-the-shelf components such as standard Micro-Controller Units. For this, SIHFT methods apply redundancy in software computation and special checker codes to detect transient errors, so called soft errors, that either corrupt the data flow or the control flow of the software and may lead to Silent Data Corruption (SDC). So far, this is done by applying separate SIHFT methods for the data and control flow protection, which leads to large overheads in computation time. This work in contrast presents REPAIR, a method that exploits the checks of the SIHFT data flow protection to also detect control flow errors as well, thereby, yielding higher SDC resilience with less computational overhead. For this, the data flow protection methods entail duplicating the computation with subsequent checks placed strategically throughout the program. These checks assure that the two redundant computation paths, which work on two different parts of the register file, yield the same result. By updating the pairing between the registers used in the primary computation path and the registers in the duplicated computation path using the REPAIR method, these checks also fail with high coverage when a control flow error, which leads to an illegal jumps, occurs. Extensive RTL fault injection simulations are carried out to accurately quantify soft error resilience while evaluating Mibench programs along with an embedded case-study running on an OpenRISC processor. Our method performs slightly better on average in terms of soft error resilience compared to the best state-of-the-art method but requiring significantly lower overheads. These results show that REPAIR is a valuable addition to the set of known SIHFT methods.

Electronics ◽  
2021 ◽  
Vol 10 (23) ◽  
pp. 3028
Author(s):  
Hwisoo So ◽  
Moslem Didehban ◽  
Yohan Ko ◽  
Reiley Jeyapaul ◽  
Jongho Kim ◽  
...  

Aggressive technology scaling and near-threshold computing have made soft error reliability one of the leading design considerations in modern embedded microprocessors. Although traditional hardware/software redundancy-based schemes can provide a high level of protection, they incur significant overheads in terms of performance and hardware resources. The considerable overheads from such full redundancy-based techniques has motivated researchers to propose low-cost soft error protection schemes, such as symptom-based error protection schemes. The main idea behind a symptom-based error protection scheme is that soft errors in the system will quickly generate some symptoms, such as exceptions, branch mispredictions, cache or TLB misses, or unpredictable variable values. Therefore, monitoring such infrequent symptoms makes it possible to cover the manifestation of failures caused by soft errors. Symptom-based protection schemes have been suggested as shortcuts to achieve acceptable reliability with comparable overheads. Since the symptom-based protection schemes seem attractive due to their generality and simplicity, even state-of-the-art protection schemes exploit them as the baseline protections. However, our detailed analysis of the fault coverage and performance overheads of such schemes reveals that the user-visible failure coverage, particularly of ReStore, is limited (29% on average). By contrast, the runtime overheads are significant (40% on average) because the majority of the fault injection experiments, which were considered as detected/recovered failures by low-level symptoms, are actually benign faults by program-level masking effects.


1999 ◽  
Vol 7 (3-4) ◽  
pp. 247-260
Author(s):  
Sungdo Moon ◽  
Byoungro So ◽  
Mary W. Hall

This paper demonstrates that significant improvements to automatic parallelization technology require that existing systems be extended in two ways: (1) they must combine high‐quality compile‐time analysis with low‐cost run‐time testing; and (2) they must take control flow into account during analysis. We support this claim with the results of an experiment that measures the safety of parallelization at run time for loops left unparallelized by the Stanford SUIF compiler’s automatic parallelization system. We present results of measurements on programs from two benchmark suites – SPECFP95and NASsample benchmarks – which identify inherently parallel loops in these programs that are missed by the compiler. We characterize remaining parallelization opportunities, and find that most of the loops require run‐time testing, analysis of control flow, or some combination of the two. We present a new compile‐time analysis technique that can be used to parallelize most of these remaining loops. This technique is designed to not only improve the results of compile‐time parallelization, but also to produce low‐cost, directed run‐time tests that allow the system to defer binding of parallelization until run‐time when safety cannot be proven statically. We call this approachpredicated array data‐flow analysis. We augment array data‐flow analysis, which the compiler uses to identify independent and privatizable arrays, by associating predicates with array data‐flow values. Predicated array data‐flow analysis allows the compiler to derive “optimistic” data‐flow values guarded by predicates; these predicates can be used to derive a run‐time test guaranteeing the safety of parallelization.


2015 ◽  
Vol 28 (3) ◽  
pp. 309-323 ◽  
Author(s):  
Navid Khoshavi ◽  
Hamid Zarandi ◽  
Mohammad Maghsoudloo

This paper presents two control-flow error recovery techniques, CFE Recovery using Data-flow graph Consideration and CFE Recovery using Macro block-level Check pointing. These techniques are proposed with regards to thread interactions in the programs. These techniques try to moderate the high memory and performance overheads of conventional control-flow checking techniques. The proposed recovery techniques are composed of two phases of control-flow error detection and recovery. These phases are designed by means of inserting additional instructions into program at compile time considering dependency graph, extracted from control-flow and data-flow dependencies among basic blocks and thread interactions in the programs. In order to evaluate the proposed techniques, five multithreaded benchmarks are utilized to run on a multi-core processor. Moreover, a total of 10000 transient faults have been injected into several executable points of each program. Fault injection experiments show that the proposed techniques recover the detected errors at-least for 91% of the cases.


2017 ◽  
Vol 26 (08) ◽  
pp. 1740009
Author(s):  
Aitzan Sari ◽  
Mihalis Psarakis

Due to the high vulnerability of SRAM-based FPGAs in single-event upsets (SEUs), effective fault tolerant soft processor architectures must be considered when we use FPGAs to build embedded systems for critical applications. In the past, the detection of symptoms of soft errors in the behavior of microprocessors has been used for the implementation of low-budget error detection techniques, instead of costly hardware redundancy techniques. To enable the development of such low-cost error detection techniques for FPGA soft processors, we propose an in-depth analysis of the symptoms of SEUs in the FPGA configuration memory. To this end, we present a flexible fault injection platform based on an open-source CAD framework (RapidSmith) for the soft error sensitivity analysis of soft processors in Xilinx SRAM-based FPGAs. Our platform supports the estimation of soft error sensitivity per configuration bit/frame, processor component and benchmark. The fault injection is performed on-chip by a dedicated microcontroller which also monitors processor behavior to identify specific symptoms as consequences of soft errors. The performed analysis showed that these symptoms can be used to build an efficient, low-cost error detection scheme. The proposed platform is demonstrated through an extensive fault injection campaign in the Leon3 soft processor.


Electronics ◽  
2021 ◽  
Vol 10 (10) ◽  
pp. 1179
Author(s):  
Jonatan Sánchez ◽  
Antonio da Silva ◽  
Pablo Parra ◽  
Óscar R. Polo ◽  
Agustín Martínez Hellín ◽  
...  

Multicore hardware platforms are being incorporated into spacecraft on-board systems to achieve faster and more efficient data processing. However, such systems lead to increased complexity in software development and represent a considerable challenge, especially concerning the runtime verification of fault-tolerance requirements. To address the ever-challenging verification of this kind of requirement, we introduce a LEON4 multicore virtual platform called LeonViP-MC. LeonViP-MC is an evolution of a previous development called Leon2ViP, carried out by the Space Research Group of the University of Alcalá (SRG-UAH), which has been successfully used in the development and testing of the flight software of the instrument control unit (ICU) of the energetic particle detector (EPD) on board the Solar Orbiter. This paper describes the LeonViP-MC architectural design decisions oriented towards fault-injection campaigns to verify software fault-tolerance mechanisms. To validate the simulator, we developed an ARINC653 communications channel that incorporates fault-tolerance mechanisms and is currently being used to develop a hypervisor level for the GR740 platform.


Sign in / Sign up

Export Citation Format

Share Document