A new architecture for online error detection and isolation in network on chip

2020 ◽  
Vol 26 (4) ◽  
pp. 307-323
Author(s):  
Chakib Nehnouh

The Network-on-Chip (NoC) has become a promising communication infrastructure for Multiprocessors-System-on-Chip (MPSoC). Reliability is a main concern in NoC and performance is degraded when NoC is susceptible to faults. A fault can be determined as a cause of deviation from the desired operation of the system (error). To deal with these reliability challenges, this work propose OFDIM (Online Fault Detection and Isolation Mechanism),a novel combined methodology to tolerate multiple permanent and transient faults. The new router architecture uses two modules to assure highly reliable and low-cost fault-tolerant strategy. In contrast to existing works, our architecture presents less area, more fault tolerance, and high reliability. The reliability comparison using Silicon Protection Factor (SPF), shows 22-time improvement and that additional circuitry incurs an area overhead of 27%, which is better than state-of-the-art reliable router architectures. Also, the results show that the throughput decreases only by 5.19% and minor increase in average latency 2.40% while providing high reliability.

Electronics ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 342 ◽  
Author(s):  
Muhammad Akmal Shafique ◽  
Naveed Khan Baloch ◽  
Muhammad Iram Baig ◽  
Fawad Hussain ◽  
Yousaf Bin Zikria ◽  
...  

Aggressive scaling in deep nanometer technology enables chip multiprocessor design facilitated by the communication-centric architecture provided by Network-on-Chip (NoC). At the same time, it brings considerable challenges in reliability because a fault in the network architecture severely impacts the performance of a system. To deal with these reliability challenges, this research proposed NoCGuard, a reconfigurable architecture designed to tolerate multiple permanent faults in each pipeline stage of the generic router. NoCGuard router architecture uses four highly reliable and low-cost fault-tolerant strategies. We exploited resource borrowing and double routing strategy for the routing computation stage, default winner strategy for the virtual channel allocation stage, runtime arbiter selection and default winner strategy for the switch allocation stage and multiple secondary bypass paths strategy for the crossbar stage. Unlike existing reliable router architectures, our architecture features less redundancy, more fault tolerance, and high reliability. Reliability comparison using Mean Time to Failure (MTTF) metric shows 5.53-time improvement in a lifetime and using Silicon Protection Factor (SPF), 22-time improvement, which is better than state-of-the-art reliable router architectures. Synthesis results using 15 nm and 45 nm technology library show that additional circuitry incurs an area overhead of 28.7% and 28% respectively. Latency analysis using synthetic, PARSEC and SPLASH-2 traffic shows minor increase in performance by 3.41%, 12% and 15% respectively while providing high reliability.


2017 ◽  
Vol E100.D (4) ◽  
pp. 910-913
Author(s):  
Ruilian XIE ◽  
Jueping CAI ◽  
Xin XIN ◽  
Bo YANG

2017 ◽  
Vol 26 (12) ◽  
pp. 1750200 ◽  
Author(s):  
Ruilian Xie ◽  
Jueping Cai ◽  
Peng Wang ◽  
Xin Zhang ◽  
Juan Wang

High reliability against undesirable effects is one of the key objectives in the design for Network-on-Chip (NoC). As a result, designing reliable and efficient routing method is highly desirable. This paper presents a novel turn model called NMad-y using one and two virtual channels along the [Formula: see text]- and [Formula: see text]-dimensions, respectively, and Adaptive and Fault-tolerant Routing Method (AFRM) which is designed based on the NMad-y turn model. AFRM can effectively tolerate multiple faulty routers and links in more complicated faulty situations by the link status of neighbor routers within two hops. AFRM is able to impose the reliability of network without losing the performance of network. Simulation results show that AFRM achieves better saturation throughput (0.83% on average) than a state-of-the-art fault-tolerant routing method and maintains high reliability of more than 97.43% on average.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5355
Author(s):  
Muhammad Rashid ◽  
Naveed Khan Baloch ◽  
Muhammad Akmal Shafique ◽  
Fawad Hussain ◽  
Shahroon Saleem ◽  
...  

Network-on-chip (NoC) architectures have become a popular communication platform for heterogeneous computing systems owing to their scalability and high performance. Aggressive technology scaling makes these architectures prone to both permanent and transient faults. This study focuses on the tolerance of a NoC router to permanent faults. A permanent fault in a NoC router severely impacts the performance of the entire network. Thus, it is necessary to incorporate component-level protection techniques in a router. In the proposed scheme, the input port utilizes a bypass path, virtual channel (VC) queuing, and VC closing strategies. Moreover, the routing computation stage utilizes spatial redundancy and double routing strategies, and the VC allocation stage utilizes spatial redundancy. The switch allocation stage utilizes run-time arbiter selection. The crossbar stage utilizes a triple bypass bus. The proposed router is highly fault-tolerant compared with the existing state-of-the-art fault-tolerant routers. The reliability of the proposed router is 7.98 times higher than that of the unprotected baseline router in terms of the mean-time-to-failure metric. The silicon protection factor metric is used to calculate the protection ability of the proposed router. Consequently, it is confirmed that the proposed router has a greater protection ability than the conventional fault-tolerant routers.


Author(s):  
Björn Osterloh ◽  
Harald Michalik ◽  
Björn Fiethe

Today FPGAs with large gate counts provide a highly flexible platform to implement a complete System-on-Chip (SoC) in a single device. Specifically radiation tolerant space suitable SRAM-based FPGAs have significantly improved the flexibility of high reliable systems for space applications. Currently the reconfigurability of these devices is only used during development phase. A further enhancement would be using the reconfigurability of SRAM-FPGAs in space, either to statically update or dynamically reconfigure processing modules. This is a major improvement in terms of maintenance and performance, which is essential for scientific instruments in space. The requirement for this enhanced system is to guarantee the system qualification and retain the achieved high reliability. Therefore effects during the reconfiguration process and interference of updated modules on the system have to be prevented. Updated modules need to be isolated physically and logically by qualified communication architecture. In this chapter the advantage of a specialized Network-on-Chip architecture to achieve a high reliable SoC with dynamic reconfiguration capability is presented. The requirements for SoC based on SRAM-FPGA in high reliable applications are outlined. Additionally the influences of radiation induced particles are described and effects during the dynamic reconfiguration are discussed. A specialized Network-on-Chip architecture is then proposed and its advantages are presented.


Sign in / Sign up

Export Citation Format

Share Document