REPAIR: Control Flow Protection based on Register Pairing Updates for SW-Implemented HW Fault Tolerance

Uzair Sharif; Daniel Mueller-Gritschneder; Ulf Schlichtmann

doi:10.1145/3477001

REPAIR: Control Flow Protection based on Register Pairing Updates for SW-Implemented HW Fault Tolerance

ACM Transactions on Embedded Computing Systems ◽

10.1145/3477001 ◽

2021 ◽

Vol 20 (5s) ◽

pp. 1-22

Author(s):

Uzair Sharif ◽

Daniel Mueller-Gritschneder ◽

Ulf Schlichtmann

Keyword(s):

Fault Tolerance ◽

Data Flow ◽

Fault Injection ◽

Low Cost ◽

Computation Time ◽

Error Resilience ◽

Control Flow ◽

Soft Error ◽

High Coverage ◽

Computation Path

Safety-critical embedded systems may either use specialized hardware or rely on Software-Implemented Hardware Fault Tolerance (SIHFT) to meet soft error resilience requirements. SIHFT has the advantage that it can be used with low-cost, off-the-shelf components such as standard Micro-Controller Units. For this, SIHFT methods apply redundancy in software computation and special checker codes to detect transient errors, so called soft errors, that either corrupt the data flow or the control flow of the software and may lead to Silent Data Corruption (SDC). So far, this is done by applying separate SIHFT methods for the data and control flow protection, which leads to large overheads in computation time. This work in contrast presents REPAIR, a method that exploits the checks of the SIHFT data flow protection to also detect control flow errors as well, thereby, yielding higher SDC resilience with less computational overhead. For this, the data flow protection methods entail duplicating the computation with subsequent checks placed strategically throughout the program. These checks assure that the two redundant computation paths, which work on two different parts of the register file, yield the same result. By updating the pairing between the registers used in the primary computation path and the registers in the duplicated computation path using the REPAIR method, these checks also fail with high coverage when a control flow error, which leads to an illegal jumps, occurs. Extensive RTL fault injection simulations are carried out to accurately quantify soft error resilience while evaluating Mibench programs along with an embedded case-study running on an OpenRISC processor. Our method performs slightly better on average in terms of soft error resilience compared to the best state-of-the-art method but requiring significantly lower overheads. These results show that REPAIR is a valuable addition to the set of known SIHFT methods.

Download Full-text

Revisiting Symptom-Based Fault Tolerant Techniques against Soft Errors

Electronics ◽

10.3390/electronics10233028 ◽

2021 ◽

Vol 10 (23) ◽

pp. 3028

Author(s):

Hwisoo So ◽

Moslem Didehban ◽

Yohan Ko ◽

Reiley Jeyapaul ◽

Jongho Kim ◽

...

Keyword(s):

Fault Tolerant ◽

Fault Injection ◽

Low Cost ◽

Soft Errors ◽

Main Idea ◽

Soft Error ◽

Error Protection ◽

Technology Scaling ◽

Protection Scheme ◽

High Level

Aggressive technology scaling and near-threshold computing have made soft error reliability one of the leading design considerations in modern embedded microprocessors. Although traditional hardware/software redundancy-based schemes can provide a high level of protection, they incur significant overheads in terms of performance and hardware resources. The considerable overheads from such full redundancy-based techniques has motivated researchers to propose low-cost soft error protection schemes, such as symptom-based error protection schemes. The main idea behind a symptom-based error protection scheme is that soft errors in the system will quickly generate some symptoms, such as exceptions, branch mispredictions, cache or TLB misses, or unpredictable variable values. Therefore, monitoring such infrequent symptoms makes it possible to cover the manifestation of failures caused by soft errors. Symptom-based protection schemes have been suggested as shortcuts to achieve acceptable reliability with comparable overheads. Since the symptom-based protection schemes seem attractive due to their generality and simplicity, even state-of-the-art protection schemes exploit them as the baseline protections. However, our detailed analysis of the fault coverage and performance overheads of such schemes reveals that the user-visible failure coverage, particularly of ReStore, is limited (29% on average). By contrast, the runtime overheads are significant (40% on average) because the majority of the fault injection experiments, which were considered as detected/recovered failures by low-level symptoms, are actually benign faults by program-level masking effects.

Download Full-text

Combining Compile-Time and Run-Time Parallelization

Scientific Programming ◽

10.1155/1999/490628 ◽

1999 ◽

Vol 7 (3-4) ◽

pp. 247-260

Author(s):

Sungdo Moon ◽

Byoungro So ◽

Mary W. Hall

Keyword(s):

Data Flow ◽

Low Cost ◽

Flow Analysis ◽

Automatic Parallelization ◽

Control Flow ◽

Array Data ◽

Data Flow Analysis ◽

Run Time ◽

Time Parallelization ◽

Array Data Flow Analysis

This paper demonstrates that significant improvements to automatic parallelization technology require that existing systems be extended in two ways: (1) they must combine high‐quality compile‐time analysis with low‐cost run‐time testing; and (2) they must take control flow into account during analysis. We support this claim with the results of an experiment that measures the safety of parallelization at run time for loops left unparallelized by the Stanford SUIF compiler’s automatic parallelization system. We present results of measurements on programs from two benchmark suites – SPECFP95and NASsample benchmarks – which identify inherently parallel loops in these programs that are missed by the compiler. We characterize remaining parallelization opportunities, and find that most of the loops require run‐time testing, analysis of control flow, or some combination of the two. We present a new compile‐time analysis technique that can be used to parallelize most of these remaining loops. This technique is designed to not only improve the results of compile‐time parallelization, but also to produce low‐cost, directed run‐time tests that allow the system to defer binding of parallelization until run‐time when safety cannot be proven statically. We call this approachpredicated array data‐flow analysis. We augment array data‐flow analysis, which the compiler uses to identify independent and privatizable arrays, by associating predicates with array data‐flow values. Predicated array data‐flow analysis allows the compiler to derive “optimistic” data‐flow values guarded by predicates; these predicates can be used to derive a run‐time test guaranteeing the safety of parallelization.

Download Full-text

Low-cost soft error resilience with unified data verification and fine-grained recovery for acoustic sensor based detection

2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) ◽

10.1109/micro.2016.7783728 ◽

2016 ◽

Cited By ~ 2

Author(s):

Qingrui Liu ◽

Changhee Jung ◽

Dongyoon Lee ◽

Devesh Tiwarit

Keyword(s):

Low Cost ◽

Error Resilience ◽

Soft Error ◽

Acoustic Sensor ◽

Data Verification ◽

Fine Grained

Download Full-text

Low cost fault tolerance against k c -cycle and k m -unit transient for loop based control data flow graphs during physically aware high level synthesis

Microelectronics Reliability ◽

10.1016/j.microrel.2017.05.023 ◽

2017 ◽

Vol 74 ◽

pp. 88-99 ◽

Cited By ~ 6

Author(s):

Anirban Sengupta ◽

Deepak Kachave

Keyword(s):

Fault Tolerance ◽

Data Flow ◽

Low Cost ◽

High Level Synthesis ◽

Control Data ◽

C Cycle ◽

Data Flow Graphs ◽

High Level ◽

Flow Graphs

Download Full-text

Two control-flow error recovery methods for multithreaded programs running on multi-core processors

Facta universitatis - series Electronics and Energetics ◽

10.2298/fuee1503309k ◽

2015 ◽

Vol 28 (3) ◽

pp. 309-323 ◽

Cited By ~ 1

Author(s):

Navid Khoshavi ◽

Hamid Zarandi ◽

Mohammad Maghsoudloo

Keyword(s):

Error Detection ◽

Data Flow ◽

Fault Injection ◽

Error Recovery ◽

Control Flow ◽

Transient Faults ◽

Multithreaded Programs ◽

Recovery Techniques ◽

And Performance ◽

Using Data

This paper presents two control-flow error recovery techniques, CFE Recovery using Data-flow graph Consideration and CFE Recovery using Macro block-level Check pointing. These techniques are proposed with regards to thread interactions in the programs. These techniques try to moderate the high memory and performance overheads of conventional control-flow checking techniques. The proposed recovery techniques are composed of two phases of control-flow error detection and recovery. These phases are designed by means of inserting additional instructions into program at compile time considering dependency graph, extracted from control-flow and data-flow dependencies among basic blocks and thread interactions in the programs. In order to evaluate the proposed techniques, five multithreaded benchmarks are utilized to run on a multi-core processor. Moreover, a total of 10000 transient faults have been injected into several executable points of each program. Fault injection experiments show that the proposed techniques recover the detected errors at-least for 91% of the cases.

Download Full-text

Combining architectural fault-injection and neutron beam testing approaches toward better understanding of GPU soft-error resilience

2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS) ◽

10.1109/mwscas.2017.8053069 ◽

2017 ◽

Cited By ~ 1

Author(s):

Fritz G. Previlon ◽

Babatunde Egbantan ◽

Devesh Tiwari ◽

Paolo Rech ◽

David. R. Kaeli

Keyword(s):

Neutron Beam ◽

Fault Injection ◽

Error Resilience ◽

Soft Error

Download Full-text

A Flexible Fault Injection Platform for the Analysis of the Symptoms of Soft Errors in FPGA Soft Processors

Journal of Circuits System and Computers ◽

10.1142/s0218126617400096 ◽

2017 ◽

Vol 26 (08) ◽

pp. 1740009

Author(s):

Aitzan Sari ◽

Mihalis Psarakis

Keyword(s):

Error Detection ◽

Fault Tolerant ◽

Fault Injection ◽

Low Cost ◽

Soft Errors ◽

Soft Error ◽

Detection Scheme ◽

Detection Techniques ◽

Error Sensitivity ◽

Depth Analysis

Due to the high vulnerability of SRAM-based FPGAs in single-event upsets (SEUs), effective fault tolerant soft processor architectures must be considered when we use FPGAs to build embedded systems for critical applications. In the past, the detection of symptoms of soft errors in the behavior of microprocessors has been used for the implementation of low-budget error detection techniques, instead of costly hardware redundancy techniques. To enable the development of such low-cost error detection techniques for FPGA soft processors, we propose an in-depth analysis of the symptoms of SEUs in the FPGA configuration memory. To this end, we present a flexible fault injection platform based on an open-source CAD framework (RapidSmith) for the soft error sensitivity analysis of soft processors in Xilinx SRAM-based FPGAs. Our platform supports the estimation of soft error sensitivity per configuration bit/frame, processor component and benchmark. The fault injection is performed on-chip by a dedicated microcontroller which also monitors processor behavior to identify specific symptoms as consequences of soft errors. The performed analysis showed that these symptoms can be used to build an efficient, low-cost error detection scheme. The proposed platform is demonstrated through an extensive fault injection campaign in the Leon3 soft processor.

Download Full-text

Low cost control flow protection using abstract control signatures

Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems ◽

10.1145/2491899.2465568 ◽

2013 ◽

Cited By ~ 11

Author(s):

Daya Shanker Khudia ◽

Scott Mahlke

Keyword(s):

Cost Control ◽

Low Cost ◽

Control Flow

Download Full-text

ARINC653 Channel Robustness Verification Using LeonViP-MC, a LEON4 Multicore Virtual Platform

Electronics ◽

10.3390/electronics10101179 ◽

2021 ◽

Vol 10 (10) ◽

pp. 1179

Author(s):

Jonatan Sánchez ◽

Antonio da Silva ◽

Pablo Parra ◽

Óscar R. Polo ◽

Agustín Martínez Hellín ◽

...

Keyword(s):

Fault Tolerance ◽

Architectural Design ◽

Fault Injection ◽

Control Unit ◽

Tolerance Mechanisms ◽

Efficient Data ◽

Instrument Control ◽

Virtual Platform ◽

Hardware Platforms ◽

The University

Multicore hardware platforms are being incorporated into spacecraft on-board systems to achieve faster and more efficient data processing. However, such systems lead to increased complexity in software development and represent a considerable challenge, especially concerning the runtime verification of fault-tolerance requirements. To address the ever-challenging verification of this kind of requirement, we introduce a LEON4 multicore virtual platform called LeonViP-MC. LeonViP-MC is an evolution of a previous development called Leon2ViP, carried out by the Space Research Group of the University of Alcalá (SRG-UAH), which has been successfully used in the development and testing of the flight software of the instrument control unit (ICU) of the energetic particle detector (EPD) on board the Solar Orbiter. This paper describes the LeonViP-MC architectural design decisions oriented towards fault-injection campaigns to verify software fault-tolerance mechanisms. To validate the simulator, we developed an ARINC653 communications channel that incorporates fault-tolerance mechanisms and is currently being used to develop a hypervisor level for the GR740 platform.

Download Full-text

F-SEFI: A Fine-Grained Soft Error Fault Injection Tool for Profiling Application Vulnerability

2014 IEEE 28th International Parallel and Distributed Processing Symposium ◽

10.1109/ipdps.2014.128 ◽

2014 ◽

Cited By ~ 30

Author(s):

Qiang Guan ◽

Nathan Debardeleben ◽

Sean Blanchard ◽

Song Fu

Keyword(s):

Fault Injection ◽

Soft Error ◽

Fine Grained

Download Full-text