WTMR--A New Fault Tolerance Technique for Wireless and Mobile Computing Systems

CGS-accumulation (Consistent Global State Accumulation) is one of the commonly used method to provide fault tolerance in distributed systems so that the system can operate even if one or more components have failed. However, mobile computing systems are constrained by low bandwidth, mobility, lack of stable storage, frequent disconnections and limited battery life. Hence CGS- accumulation etiquettes which have lesser reinstatement- points are favored in mobile environment. In this paper, we propose a minimum-method coordinated CGS-accumulation etiquette for deterministic distributed applications on mobile computing systems. We eliminate useless reinstatement-points as well as blocking of methods during reinstatement-points at the cost of logging anti- messages of very few messages during CGS-accumulation. We also try to minimize the loss of CGS-accumulation effort when any method miscarries to capture its reinstatement-point in an instigation. In this way, we take care of excessive disappointments during CGS-accumulation. We make logging of anti-messages of very few messages only during CGS-accumulation. We also strive to minimize loss of CGS-accumulation effort.

Download Full-text

Mobile Agent Based Fault-Tolerance Support for the Reliable Mobile Computing Systems

Lecture Notes in Computer Science - Coordination Models and Languages ◽

10.1007/11417019_12 ◽

2005 ◽

pp. 173-187 ◽

Cited By ~ 3

Author(s):

Taesoon Park

Keyword(s):

Fault Tolerance ◽

Mobile Computing ◽

Mobile Agent ◽

Computing Systems ◽

Agent Based

Download Full-text

Application-based fault tolerance techniques for sparse matrix solvers

The International Journal of High Performance Computing Applications ◽

10.1177/1094342017694946 ◽

2017 ◽

Vol 32 (5) ◽

pp. 627-640

Author(s):

Simon McIntosh–Smith ◽

Rob Hunt ◽

James Price ◽

Alex Warwick Vesztrocy

Keyword(s):

Fault Tolerance ◽

High Performance Computing ◽

High Performance ◽

Sparse Matrix ◽

Sparse Matrices ◽

Error Correcting Codes ◽

Computing Systems ◽

Hardware Costs ◽

Extreme Scale ◽

Performance Computing

High-performance computing systems continue to increase in size in the quest for ever higher performance. The resulting increased electronic component count, coupled with the decrease in feature sizes of the silicon manufacturing processes used to build these components, may result in future exascale systems being more susceptible to soft errors caused by cosmic radiation than in current high-performance computing systems. Through the use of techniques such as hardware-based error-correcting codes and checkpoint-restart, many of these faults can be mitigated at the cost of increased hardware overhead, run-time, and energy consumption that can be as much as 10–20%. Some predictions expect these overheads to continue to grow over time. For extreme scale systems, these overheads will represent megawatts of power consumption and millions of dollars of additional hardware costs, which could potentially be avoided with more sophisticated fault-tolerance techniques. In this paper we present new software-based fault tolerance techniques that can be applied to one of the most important classes of software in high-performance computing: iterative sparse matrix solvers. Our new techniques enables us to exploit knowledge of the structure of sparse matrices in such a way as to improve the performance, energy efficiency, and fault tolerance of the overall solution.

Download Full-text

Distributed snapshots for mobile computing systems

Second IEEE Annual Conference on Pervasive Computing and Communications, 2004. Proceedings of the ◽

10.1109/percom.2004.1276856 ◽

2004 ◽

Cited By ~ 8

Author(s):

A. Agbaria ◽

W.H. Sanders

Keyword(s):

Mobile Computing ◽

Computing Systems

Download Full-text

Log Based Recovery with Low Overhead for Mobile Computing Systems

Computer Networks and Information Technologies - Communications in Computer and Information Science ◽

10.1007/978-3-642-19542-6_125 ◽

2011 ◽

pp. 637-642

Author(s):

Awadhesh Kumar Singh ◽

Parmeet Kaur

Keyword(s):

Mobile Computing ◽

Computing Systems

Download Full-text

Guest Editorial Special Issue on Multimedia Services Provision Over Future Mobile Computing Systems

IEEE Systems Journal ◽

10.1109/jsyst.2017.2783463 ◽

2018 ◽

Vol 12 (1) ◽

pp. 12-15 ◽

Cited By ~ 1

Author(s):

George Mastorakis ◽

Evangelos Pallis ◽

Constandinos X. Mavromoustakis ◽

Lei Shu ◽

Joel J. P. C. Rodrigues

Keyword(s):

Mobile Computing ◽

Guest Editorial ◽

Special Issue ◽

Multimedia Services ◽

Computing Systems ◽

Editorial Special Issue

Download Full-text

A General Framework of Algorithm-Based Fault Tolerance Technique for Computing Systems

Analyzing Security, Trust, and Crime in the Digital World - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-4666-4856-2.ch001 ◽

2014 ◽

pp. 1-21 ◽

Cited By ~ 1

Author(s):

Hodjatollah Hamidi

Keyword(s):

Fault Tolerance ◽

Error Correction ◽

General Framework ◽

Fault Tolerant ◽

Convolutional Code ◽

Numerical Algorithms ◽

Convolutional Codes ◽

Computing Systems ◽

Specific Level ◽

Computing Paradigm

The Algorithm-Based Fault Tolerance (ABFT) approach transforms a system that does not tolerate a specific type of faults, called the fault-intolerant system, to a system that provides a specific level of fault tolerance, namely recovery. The ABFT philosophy leads directly to a model from which error correction can be developed. By employing an ABFT scheme with effective convolutional code, the design allows high throughput as well as high fault coverage. The ABFT techniques that detect errors rely on the comparison of parity values computed in two ways. The parallel processing of input parity values produce output parity values comparable with parity values regenerated from the original processed outputs and can apply convolutional codes for the redundancy. This method is a new approach to concurrent error correction in fault-tolerant computing systems. This chapter proposes a novel computing paradigm to provide fault tolerance for numerical algorithms. The authors also present, implement, and evaluate early detection in ABFT.

Download Full-text

A Method to Support Fault Tolerance Design in Service Oriented Computing Systems

Theoretical and Analytical Service-Focused Systems Design and Development ◽

10.4018/978-1-4666-1767-4.ch019 ◽

2012 ◽

pp. 362-376

Author(s):

Domenico Cotroneo ◽

Antonio Pecchia ◽

Roberto Pietrantuono ◽

Stefano Russo

Keyword(s):

Fault Tolerance ◽

Common Ground ◽

Fault Injection ◽

Failure Behavior ◽

Tolerance Design ◽

System Failure ◽

Computing Systems ◽

Service Oriented Computing ◽

Service Oriented ◽

Tailored Design

Service Oriented Computing relies on the integration of heterogeneous software technologies and infrastructures that provide developers with a common ground for composing services and producing applications flexibly. However, this approach eases software development but makes dependability a big challenge. Integrating such diverse software items raise issues that traditional testing is not able to exhaustively cope with. In this context, tolerating faults, rather than attempt to detect them solely by testing, is a more suitable solution. This paper proposes a method to support a tailored design of fault tolerance actions for the system being developed. This paper describes system failure behavior through an extensive fault injection campaign to figure out its criticalities and adopt the most appropriate countermeasures to tolerate operational faults. The proposed method is applied to two distinct SOC-enabling technologies. Results show how the achieved findings allow designers to understand the system failure behavior and plan fault tolerance.

Download Full-text

Design of Wearable Computing Systems for Future Industrial Environments

Handbook of Research on Mobility and Computing ◽

10.4018/978-1-60960-042-6.ch075 ◽

2011 ◽

pp. 1226-1245

Author(s):

Pierre Kirisci ◽

Ernesto Morales Kluge ◽

Emanuel Angelescu ◽

Klaus-Dieter Thoben

Keyword(s):

Mobile Computing ◽

Design Process ◽

Reference Model ◽

Wearable Computing ◽

Computing System ◽

Context Aware ◽

Context Model ◽

Computing Systems ◽

Model Driven ◽

Industrial Environments

During the last two decades a lot of methodology research has been conducted for the design of software user interfaces (Kirisci, Thoben 2009). Despite the numerous contributions in this area, comparatively few efforts have been dedicated to the advancement of methods for the design of context-aware mobile platforms, such as wearable computing systems. This chapter investigates the role of context, particularly in future industrial environments, and elaborates how context can be incorporated in a design method in order to support the design process of wearable computing systems. The chapter is initiated by an overview of basic research in the area of context-aware mobile computing. The aim is to identify the main context elements which have an impact upon the technical properties of a wearable computing system. Therefore, we describe a systematic and quantitative study of the advantages of context recognition, specifically task tracking, for a wearable maintenance assistance system. Based upon the experiences from this study, a context reference model is proposed, which can be considered supportive for the design of wearable computing systems in industrial settings, thus goes beyond existing context models, e.g. for context-aware mobile computing. The final part of this chapter discusses the benefits of applying model-based approaches during the early design stages of wearable computing systems. Existing design methods in the area of wearable computing are critically examined and their shortcomings highlighted. Based upon the context reference model, a design approach is proposed through the realization of a model-driven software tool which supports the design process of a wearable computing system while taking advantage of concise experience manifested in a well-defined context model.

Download Full-text