Design and Packaging of Fault Tolerant Optoelectronic Multiprocessor Computing System

1992 ◽  
Author(s):  
Sing H. Lee
Author(s):  
Yongning Zhai ◽  
Weiwei Li

For the distributed computing system, excessive or deficient checkpointing operations would result in severe performance degradation. To minimize the expected computation execution of the long-running application with a general failure distribution, the optimal equidistant checkpoint interval for fault tolerant performance optimization is analyzed and derived in this paper. More precisely, the optimal checkpointing period to determine the proper checkpoint sequence is proposed, and the derivation of the expected effective rate of the defined computation cycle is introduced. Corresponding to the maximal expected effective rate, the constraint of the optimal checkpoint sequence can be obtained. From the constraint of optimality, the optimal equidistant checkpoint interval can be obtained according to the minimal fault tolerant overhead ratio. By the numerical results, the proposal is practical to determine a proper equidistant checkpoint interval for fault tolerant performance optimization.


Author(s):  
L. Rubanov ◽  
G. Shilovsky ◽  
A. Seliverstov ◽  
O. Zverkov

We have developed an efficient algorithm implemented in a program for a multiprocessor computing system, which makes it possible to discover genes lost or acquired during the evolution of most species from a given set. The new approach takes into account the mutual arrangement of genes on a chromosome and allows us to simultaneously consider hundreds of species. The research was carried out using supercomputers at the Joint Supercomputer Center of the Russian Academy of Sciences (JSCC RAS).


1974 ◽  
Vol 3 (32) ◽  
Author(s):  
L. Phillip Caillouet ◽  
Bruce D. Shriver

This paper offers an introduction to a research effort in fault tolerant computer architecture which has been organized at the University of Southwestern Louisiana (USL). It is intended as an overview of several topics which have been isolated for study, and as an indication of preliminary undertakings with regards to one particular topic. This first area of concentration lnvolves the systematic design of fault tolerant computing systems via a multi-level approach. Efforts are being initiated also in the areas of diagnosis of microprogrammable processors via firmware, fault data management across levels of virtual machines, development of a methodology for realizing a firmware hardcore on a variety of hosts, and delineation of a minimal set of resources for the design of a practical host for a multi-level fault tolerant computing system. The research is being conducted under the auspices of Project Beta at USL.


2020 ◽  
Vol 37 (6/7) ◽  
pp. 983-1005
Author(s):  
Chandra Shekhar ◽  
Amit Gupta ◽  
Madhu Jain ◽  
Neeraj Kumar

PurposeThe purpose of this paper is to present a sensitivity analysis of fault-tolerant redundant repairable computing systems with imperfect coverage, reboot and recovery process.Design/methodology/approachIn this investigation, the authors consider the computing system having a finite number of identical working units functioning simultaneously with the provision of standby units. Working and standby units are prone to random failure in nature and are administered by unreliable software, which is also likely to unpredictable failure. The redundant repairable computing system is modeled as a Markovian machine interference problem with exponentially distributed failure rates and service rates. To excerpt the failed unit from the computing system, the system either opts randomized reboot process or leads to recovery delay.FindingsTransient-state probabilities have been determined with which the authors develop various reliability measures, namely reliability/availability, mean time to failure, failure frequency, and so on, and queueing characteristics, namely expected number of failed units, the throughput of the system and so on, for the predictive purpose. To spectacle the practicability of the developed model, a numerical simulation, sensitivity analysis and so on for different parameters have also been done, and the results are summarized in the tables and graphs. The transient results are helpful to analyze the developing model of the system before having the stability of the system. The derived measures give direct insights into parametric decision-making.Social implicationsThe conclusion has been drawn, and future scope is remarked. The present research study would help system analyst and system designer to make a better choice/decision in order to have the economical design and strategy based on the desired mean time to failure, reliability/availability of the systems and other queueing characteristics.Originality/valueDifferent from previous investigations, this studied model provides a more accurate assessment of the computing system compared to uncertain environments based on sensitivity analysis.


Author(s):  
Denis Zolotariov

The article is devoted to the research and development of a highly available distributed automated computing system by iterative algorithms based on the microservice architecture in a cloud infrastructure. The subject of the research is the practical foundations of building high-availability automated computing systems based on microservice architecture in a cloud-based distributed infrastructure. The purpose of the article is to develop and to substantiate practical recommendations for the formation of the infrastructure of a high-availability automated computing system based on the microservice architecture, the choice of its constituent elements and their components. The task of the work: to identify the necessary structural elements of a microservice automated computing system and to analyze the constituent components and functional load for each of them, set specific tasks for building each of them and justify the choice of tools for their creation. In the course of the research, methods of system analysis were used to decompose a complex system into elements and each element into functional components, and tools: information technologies Apache Kafka, Kafkacat, Wolfram Mathematica, nginx, Lumen, Telegram, Dropbox, and MySQL. As a result of the study, it was found that the system infrastructure should consist of: fault-tolerant interservice transport, a high-availability computing microservice, and communication microservices with end customers, which save or process the results. For each of them, recommendations are provided regarding the formation and selection of implementation tools. According to the recommendations, one variant of implementation of such system has been developed, the principles of its operation are shown and the results are presented. It has been proven that when using a Kafka queue it is efficient to publish batches of results rather than one at a time, which results to significant overhead on queue servers and data latency for its clients. Recommendations are given on the implementation of the CI/CD system to build a continuous cycle of adding and improving microservices. Conclusions. Practical foundations have been developed for the implementation of high availability distributed automated computing systems based on microservice architecture in a cloud infrastructure. The flexibility in processing the results of such a system is shown due to the possibility of adding microservices and using third-party analytical applications that support connection to the Kafka queue. The economic benefit of using the described system is shown. Future ways of its improvement are given.


Sign in / Sign up

Export Citation Format

Share Document