WTMR--A New Fault Tolerance Technique for Wireless and Mobile Computing Systems

Author(s):  
Sarmistha Neogy

CGS-accumulation (Consistent Global State Accumulation) is one of the commonly used method to provide fault tolerance in distributed systems so that the system can operate even if one or more components have failed. However, mobile computing systems are constrained by low bandwidth, mobility, lack of stable storage, frequent disconnections and limited battery life. Hence CGS- accumulation etiquettes which have lesser reinstatement- points are favored in mobile environment. In this paper, we propose a minimum-method coordinated CGS-accumulation etiquette for deterministic distributed applications on mobile computing systems. We eliminate useless reinstatement-points as well as blocking of methods during reinstatement-points at the cost of logging anti- messages of very few messages during CGS-accumulation. We also try to minimize the loss of CGS-accumulation effort when any method miscarries to capture its reinstatement-point in an instigation. In this way, we take care of excessive disappointments during CGS-accumulation. We make logging of anti-messages of very few messages only during CGS-accumulation. We also strive to minimize loss of CGS-accumulation effort.


Author(s):  
Simon McIntosh–Smith ◽  
Rob Hunt ◽  
James Price ◽  
Alex Warwick Vesztrocy

High-performance computing systems continue to increase in size in the quest for ever higher performance. The resulting increased electronic component count, coupled with the decrease in feature sizes of the silicon manufacturing processes used to build these components, may result in future exascale systems being more susceptible to soft errors caused by cosmic radiation than in current high-performance computing systems. Through the use of techniques such as hardware-based error-correcting codes and checkpoint-restart, many of these faults can be mitigated at the cost of increased hardware overhead, run-time, and energy consumption that can be as much as 10–20%. Some predictions expect these overheads to continue to grow over time. For extreme scale systems, these overheads will represent megawatts of power consumption and millions of dollars of additional hardware costs, which could potentially be avoided with more sophisticated fault-tolerance techniques. In this paper we present new software-based fault tolerance techniques that can be applied to one of the most important classes of software in high-performance computing: iterative sparse matrix solvers. Our new techniques enables us to exploit knowledge of the structure of sparse matrices in such a way as to improve the performance, energy efficiency, and fault tolerance of the overall solution.


2018 ◽  
Vol 12 (1) ◽  
pp. 12-15 ◽  
Author(s):  
George Mastorakis ◽  
Evangelos Pallis ◽  
Constandinos X. Mavromoustakis ◽  
Lei Shu ◽  
Joel J. P. C. Rodrigues

Author(s):  
Hodjatollah Hamidi

The Algorithm-Based Fault Tolerance (ABFT) approach transforms a system that does not tolerate a specific type of faults, called the fault-intolerant system, to a system that provides a specific level of fault tolerance, namely recovery. The ABFT philosophy leads directly to a model from which error correction can be developed. By employing an ABFT scheme with effective convolutional code, the design allows high throughput as well as high fault coverage. The ABFT techniques that detect errors rely on the comparison of parity values computed in two ways. The parallel processing of input parity values produce output parity values comparable with parity values regenerated from the original processed outputs and can apply convolutional codes for the redundancy. This method is a new approach to concurrent error correction in fault-tolerant computing systems. This chapter proposes a novel computing paradigm to provide fault tolerance for numerical algorithms. The authors also present, implement, and evaluate early detection in ABFT.


Author(s):  
Domenico Cotroneo ◽  
Antonio Pecchia ◽  
Roberto Pietrantuono ◽  
Stefano Russo

Service Oriented Computing relies on the integration of heterogeneous software technologies and infrastructures that provide developers with a common ground for composing services and producing applications flexibly. However, this approach eases software development but makes dependability a big challenge. Integrating such diverse software items raise issues that traditional testing is not able to exhaustively cope with. In this context, tolerating faults, rather than attempt to detect them solely by testing, is a more suitable solution. This paper proposes a method to support a tailored design of fault tolerance actions for the system being developed. This paper describes system failure behavior through an extensive fault injection campaign to figure out its criticalities and adopt the most appropriate countermeasures to tolerate operational faults. The proposed method is applied to two distinct SOC-enabling technologies. Results show how the achieved findings allow designers to understand the system failure behavior and plan fault tolerance.


Author(s):  
Pierre Kirisci ◽  
Ernesto Morales Kluge ◽  
Emanuel Angelescu ◽  
Klaus-Dieter Thoben

During the last two decades a lot of methodology research has been conducted for the design of software user interfaces (Kirisci, Thoben 2009). Despite the numerous contributions in this area, comparatively few efforts have been dedicated to the advancement of methods for the design of context-aware mobile platforms, such as wearable computing systems. This chapter investigates the role of context, particularly in future industrial environments, and elaborates how context can be incorporated in a design method in order to support the design process of wearable computing systems. The chapter is initiated by an overview of basic research in the area of context-aware mobile computing. The aim is to identify the main context elements which have an impact upon the technical properties of a wearable computing system. Therefore, we describe a systematic and quantitative study of the advantages of context recognition, specifically task tracking, for a wearable maintenance assistance system. Based upon the experiences from this study, a context reference model is proposed, which can be considered supportive for the design of wearable computing systems in industrial settings, thus goes beyond existing context models, e.g. for context-aware mobile computing. The final part of this chapter discusses the benefits of applying model-based approaches during the early design stages of wearable computing systems. Existing design methods in the area of wearable computing are critically examined and their shortcomings highlighted. Based upon the context reference model, a design approach is proposed through the realization of a model-driven software tool which supports the design process of a wearable computing system while taking advantage of concise experience manifested in a well-defined context model.


Sign in / Sign up

Export Citation Format

Share Document