Analysis and Evaluation of a New Algorithm Based Fault Tolerance for Computing Systems

Hodjat Hamidi; Abbas Vafaei; Seyed Amir Hassan Monadjemi

doi:10.4018/jghpc.2012010103

Analysis and Evaluation of a New Algorithm Based Fault Tolerance for Computing Systems

International Journal of Grid and High Performance Computing ◽

10.4018/jghpc.2012010103 ◽

2012 ◽

Vol 4 (1) ◽

pp. 37-51 ◽

Cited By ~ 7

Author(s):

Hodjat Hamidi ◽

Abbas Vafaei ◽

Seyed Amir Hassan Monadjemi

Keyword(s):

Fault Tolerance ◽

High Performance ◽

Fault Tolerant ◽

Numerical Algorithms ◽

Computing System ◽

Computing Systems ◽

Specific Level ◽

New Approach ◽

Computing Paradigm ◽

High Performance Computing System

In this paper, the authors present a new approach to algorithm based fault tolerance (ABFT) for High Performance computing system. The Algorithm Based Fault Tolerance approach transforms a system that does not tolerate a specific type of fault, called the fault-intolerant system, to a system that provides a specific level of fault tolerance, namely recovery. The ABFT techniques that detect errors rely on the comparison of parity values computed in two ways, the parallel processing of input parity values produce output parity values comparable with parity values regenerated from the original processed outputs, can apply convolution codes for the redundancy. This method is a new approach to concurrent error correction in fault-tolerant computing systems. This paper proposes a novel computing paradigm to provide fault tolerance for numerical algorithms. The authors also present, implement, and evaluate early detection in ABFT.

Download Full-text

A General Framework of Algorithm-Based Fault Tolerance Technique for Computing Systems

Analyzing Security, Trust, and Crime in the Digital World - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-4666-4856-2.ch001 ◽

2014 ◽

pp. 1-21 ◽

Cited By ~ 1

Author(s):

Hodjatollah Hamidi

Keyword(s):

Fault Tolerance ◽

Error Correction ◽

General Framework ◽

Fault Tolerant ◽

Convolutional Code ◽

Numerical Algorithms ◽

Convolutional Codes ◽

Computing Systems ◽

Specific Level ◽

Computing Paradigm

The Algorithm-Based Fault Tolerance (ABFT) approach transforms a system that does not tolerate a specific type of faults, called the fault-intolerant system, to a system that provides a specific level of fault tolerance, namely recovery. The ABFT philosophy leads directly to a model from which error correction can be developed. By employing an ABFT scheme with effective convolutional code, the design allows high throughput as well as high fault coverage. The ABFT techniques that detect errors rely on the comparison of parity values computed in two ways. The parallel processing of input parity values produce output parity values comparable with parity values regenerated from the original processed outputs and can apply convolutional codes for the redundancy. This method is a new approach to concurrent error correction in fault-tolerant computing systems. This chapter proposes a novel computing paradigm to provide fault tolerance for numerical algorithms. The authors also present, implement, and evaluate early detection in ABFT.

Download Full-text

Creating of an individual modeling environment in a hybrid high-performance computing system

Izvestiya Vysshikh Uchebnykh Zavedenii Materialy Elektronnoi Tekhniki = Materials of Electronics Engineering ◽

10.17073/1609-3577-2019-3-197-201 ◽

2020 ◽

Vol 22 (3) ◽

pp. 197-201

Author(s):

K. I. Volovich ◽

S. A. Denisov ◽

S. I. Malkovsky

Keyword(s):

High Performance Computing ◽

High Performance ◽

Optimal Solution ◽

Computing System ◽

Computing Environment ◽

Computing Systems ◽

Modeling Environment ◽

Modeling Systems ◽

High Performance Computing System ◽

Performance Computing

The article is devoted to the problem of solving scientific problems in the field of high-performance computing systems. An approach to solving a certain kind of problems in materials science is the use of mathematical modeling technologies implemented by specialized modeling systems. The greatest efficiency of the modeling system is shown when deployed in hybrid high-performance computing systems (HHPC), which have high performance and allow solving problems in an acceptable time with sufficient accuracy. However, there are a number of limitations that affect the work of the research team with modeling systems in the HHPC computing environment: the need to access graphics accelerators at the stage of development and debugging of algorithms in the modeling system, the need to use several modeling systems in order to obtain the most optimal solution, the need to dynamically change settings modeling systems for solving problems. The solution to the problem of the above limitations is assigned to an individual modeling environment functioning in the HHPC computing environment. The optimal solution for creating an individual modeling environment is the technology of virtual containerization. An algorithm for the formation of an individual modeling environment in a hybrid high-performance computing complex based on the «docker» virtual containerization system is proposed. An individual modeling environment is created by installing the necessary software in the base container, setting environment variables, installing custom software and licenses. A feature of the algorithm is the ability to form a library image from a base container with a customized individual modeling environment. In conclusion, the direction for further research work is indicated. The algorithm presented in the article is independent of the implementation of the job management system and can be used for any high-performance computing system.

Download Full-text

MAGNETIC LOGIC DEVICES FOR FUTURE COMPUTING SYSTEMS

SPIN ◽

10.1142/s2010324713400122 ◽

2013 ◽

Vol 03 (04) ◽

pp. 1340012 ◽

Cited By ~ 1

Author(s):

HAO MENG ◽

GUCHANG HAN

Keyword(s):

High Performance ◽

Magnetic Domain ◽

Complementary Metal Oxide Semiconductor ◽

Computing System ◽

Oxide Semiconductor ◽

Logic Device ◽

Computing Systems ◽

Logic Devices ◽

High Performance Computing System ◽

Recent Developments

High performance computing system design based on complementary metal oxide semiconductor (CMOS) is facing more and more challenges due to the volatility, increased leak current and interconnection delay. Computations utilizing magnetic logic devices have attracted considerable interest as the potential alternatives because of their features of nonvolatility, re-configurability, unlimited endurance and low power consumption. Instead of using electron charges, the magnetic logic device stores and processes the data information by controlling spins, i.e., the magnetization states in a device. The emerging technologies related to the magnetic logic are mainly composed of three design schemes, i.e., the magnetoresistive logic, the magnetic quantum cellular automata and the magnetic domain wall logic. This paper will illustrate the principles as well as review the recent developments of these magnetic logic devices. Challenges and prospects of the future development are also discussed.

Download Full-text

Implement of a high-performance computing system for parallel processing of scientific applications and the teaching of multicore and parallel programming

Proceedings INNODOCT/18. International Conference on Innovation, Documentation and Education ◽

10.4995/inn2018.2018.8908 ◽

2018 ◽

Author(s):

Apolinar Velarde Martinez

Keyword(s):

High Performance Computing ◽

High Speed ◽

High Performance ◽

Education Institution ◽

Computing System ◽

Memory Systems ◽

Distributed Computing Systems ◽

Computing Systems ◽

High Performance Computing System ◽

Performance Computing

Increasingly complex algorithms for the modeling and resolution of different problems, which are currently facing humanity, has made it necessary the advent of new data processing requirements and the consequent implementation of high performance computing systems; but due to the high economic cost of this type of equipment and considering that an education institution cannot acquire, it is necessary to develop and implement computable architectures that are economical and scalable in their construction, such as heterogeneous distributed computing systems, constituted by several clustering of multicore processing elements with shared and distributed memory systems. This paper presents the analysis, design and implementation of a high-performance computing system called Liebres InTELigentes, whose purpose is the design and execution of intrinsically parallel algorithms, which require high amounts of storage and excessive processing times. The proposed computer system is constituted by conventional computing equipment (desktop computers, lap top equipment and servers), linked by a high-speed network. The main objective of this research is to build technology for the purposes of scientific and educational research.

Download Full-text

Construction of high performance computing system for fusion research using cluster technology

Journal of Computer Applications ◽

10.3724/sp.j.1087.2009.02132 ◽

2009 ◽

Vol 29 (8) ◽

pp. 2132-2135

Author(s):

Wei PAN ◽

Liao-yuan CHEN ◽

Yong-ge LI ◽

Jin-hua ZHANG ◽

Li PAN ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

Computing System ◽

Fusion Research ◽

High Performance Computing System ◽

Performance Computing ◽

Cluster Technology

Download Full-text

Investigating the usefulness of a micro high performance computing system as an educational tool

Proceedings of the 2nd International Conference on Intelligent and Innovative Computing Applications ◽

10.1145/3415088.3415105 ◽

2020 ◽

Author(s):

Nkundwe Moses Mwasaga ◽

Mike Joy

Keyword(s):

High Performance Computing ◽

High Performance ◽

Computing System ◽

Educational Tool ◽

High Performance Computing System ◽

Performance Computing

Download Full-text

High-performance computing system and artificial language recognition in the visual application of intangible cultural heritage art design

Personal and Ubiquitous Computing ◽

10.1007/s00779-021-01619-z ◽

2021 ◽

Author(s):

Xiaodan Peng

Keyword(s):

Cultural Heritage ◽

High Performance Computing ◽

High Performance ◽

Computing System ◽

Intangible Cultural Heritage ◽

Artificial Language ◽

Language Recognition ◽

High Performance Computing System ◽

Art Design ◽

Performance Computing

Download Full-text

A High-Performance Computing System for Probabilistic Weather Forecasts

10.1002/essoar.10500383.1 ◽

2019 ◽

Author(s):

Weiming Hu ◽

Guido Cervone ◽

Vivek Balasubramanian ◽

Matteo Turilli ◽

Shantenu Jha

Keyword(s):

High Performance Computing ◽

High Performance ◽

Computing System ◽

Weather Forecasts ◽

High Performance Computing System ◽

Performance Computing

Download Full-text

Application-based fault tolerance techniques for sparse matrix solvers

The International Journal of High Performance Computing Applications ◽

10.1177/1094342017694946 ◽

2017 ◽

Vol 32 (5) ◽

pp. 627-640

Author(s):

Simon McIntosh–Smith ◽

Rob Hunt ◽

James Price ◽

Alex Warwick Vesztrocy

Keyword(s):

Fault Tolerance ◽

High Performance Computing ◽

High Performance ◽

Sparse Matrix ◽

Sparse Matrices ◽

Error Correcting Codes ◽

Computing Systems ◽

Hardware Costs ◽

Extreme Scale ◽

Performance Computing

High-performance computing systems continue to increase in size in the quest for ever higher performance. The resulting increased electronic component count, coupled with the decrease in feature sizes of the silicon manufacturing processes used to build these components, may result in future exascale systems being more susceptible to soft errors caused by cosmic radiation than in current high-performance computing systems. Through the use of techniques such as hardware-based error-correcting codes and checkpoint-restart, many of these faults can be mitigated at the cost of increased hardware overhead, run-time, and energy consumption that can be as much as 10–20%. Some predictions expect these overheads to continue to grow over time. For extreme scale systems, these overheads will represent megawatts of power consumption and millions of dollars of additional hardware costs, which could potentially be avoided with more sophisticated fault-tolerance techniques. In this paper we present new software-based fault tolerance techniques that can be applied to one of the most important classes of software in high-performance computing: iterative sparse matrix solvers. Our new techniques enables us to exploit knowledge of the structure of sparse matrices in such a way as to improve the performance, energy efficiency, and fault tolerance of the overall solution.

Download Full-text

Taxonomic assignment for large-scale metagenomic data on high-perfomance systems

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/33/2/10753 ◽

2017 ◽

Vol 33 (2) ◽

pp. 119-130

Author(s):

Vinh Van Le ◽

Hoai Van Tran ◽

Hieu Ngoc Duong ◽

Giang Xuan Bui ◽

Lang Van Tran

Keyword(s):

High Performance Computing ◽

Assignment Problem ◽

High Performance ◽

Large Scale ◽

Computing System ◽

Metagenomic Data ◽

Taxonomic Assignment ◽

High Performance Computing System ◽

Powerful Approach ◽

Performance Computing

Metagenomics is a powerful approach to study environment samples which do not require the isolation and cultivation of individual organisms. One of the essential tasks in a metagenomic project is to identify the origin of reads, referred to as taxonomic assignment. Due to the fact that each metagenomic project has to analyze large-scale datasets, the metatenomic assignment is very much computation intensive. This study proposes a parallel algorithm for the taxonomic assignment problem, called SeMetaPL, which aims to deal with the computational challenge. The proposed algorithm is evaluated with both simulated and real datasets on a high performance computing system. Experimental results demonstrate that the algorithm is able to achieve good performance and utilize resources of the system efficiently. The software implementing the algorithm and all test datasets can be downloaded at http://it.hcmute.edu.vn/bioinfo/metapro/SeMetaPL.html.

Download Full-text