A Similar Resource Auto-Discovery Based Adaptive Fault-tolerance Method for Embedded Distributed System

Author(s):  
Kailong Zhang ◽  
Ke Liang ◽  
Xingshe Zhou ◽  
Kaibo Wang ◽  
Xiao Wu ◽  
...  
2021 ◽  
pp. 102217
Author(s):  
Yu Wu ◽  
Duo Liu ◽  
Xianzhang Chen ◽  
Jinting Ren ◽  
Renping Liu ◽  
...  

2020 ◽  
Vol 8 (5) ◽  
pp. 2040-2044

The cloud technologies are gaining boom in the field of information technology. But on the same side cloud computing sometimes results in failures. These failures demand more reliable frameworks with high availability of computers acting as nodes. The request made by the user is replicated and sent to various VMs. If one of the VMs fail, the other can respond to increase the reliability. A lot of research has been done and being carried out to suggest various schemes for fault tolerance thus increasing the reliability. Earlier schemes focus on only one way of dealing with faults but the scheme proposed by the the author in this paper presents an adaptive scheme that deals with the issues related to fault tolerance in various cloud infrastructure. The projected scheme uses adaptive behavior during the selection of replication and fine-grained checkpointing methods for attaining a reliable cloud infrastructure that can handle different client requirements. In addition to it the algorithm also determines the best suited fault tolerance method for every designated virtual node. Zheng, Zhou,. Lyu and I. King (2012).


2015 ◽  
Vol 2015 ◽  
pp. 1-7 ◽  
Author(s):  
Yang Liu ◽  
Wei Wei

MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster. In cloud environment, node and task failure are no longer accidental but a common feature of large-scale systems. Current rescheduling-based fault tolerance method in MapReduce framework failed to fully consider the location of distributed data and the computation and storage overhead of rescheduling failure tasks. Thus, a single node failure will increase the completion time dramatically. In this paper, a replication-based mechanism is proposed, which takes both task and node failure into consideration. Experimental results show that, compared with default mechanism in Hadoop, our mechanism can significantly improve the performance at failure time, with more than 30% decreasing in execution time.


2014 ◽  
pp. 92-99
Author(s):  
N. P. Gopalan ◽  
K. Nagarajan

Checkpointing mechanism is the one of the best attractive approach for providing software fault tolerance in distributed message passing systems. This paper aims to implement a distributed checkpointing technique, which eliminates the drawbacks of the centralized approach like “domino effect”, “useless checkpoint” (checkpoints that do not contribute to global consistency), and “hidden and zigzag” dependencies. The proposed checkpointing protocol has a checkpoint initiator, but, coordination among the local checkpoints is done in a distributed fashion. This guaranty that no message would be lost in case of failure occurs, has been maintained in this work by exchange of information among the processes. However, there is no central checkpoint initiator, but each of the processes takes turn to act as an initiator. Processes take local checkpoints only after being notified by the initiator. The processes synchronize their activities of the current checkpointing interval before finally committing their checkpoints. Thus, the checkpointing pattern described in this paper takes only those checkpoints that will contribute to the consistent global snapshot thereby eliminating the number of useless checkpoints.


Author(s):  
Ahmad Shukri Mohd Noor ◽  
Nur Farhah Mat Zian ◽  
Noor Hafhizah Abd Rahim ◽  
Rabiei Mamat ◽  
Wan Nur Amira Wan Azman

The availability of the data in a distributed system can be increase by implementing fault tolerance mechanism in the system. Reactive method in fault tolerance mechanism deals with restarting the failed services, placing redundant copies of data in multiple nodes across network, in other words data replication and migrating the data for recovery. Even if the idea of data replication is solid, the challenge is to choose the right replication technique that able to provide better data availability as well as consistency that involves read and write operations on the redundant copies. Circular Neighboring Replication (CNR) technique exploits neighboring policy in replicating the data items in the system performs well with regards to lower copies needed to maintain the system availability at the highest. In a performance analysis with existing techniques, results show that CNR improves system availability by average 37% by offering only two replicas needed to maintain data availability and consistency. The study demonstrates the possibility of the proposed technique and the potential of deploying in larger and complex environment.


Sign in / Sign up

Export Citation Format

Share Document