scholarly journals Impact of a failure detection mechanism on the performance of consensus

Author(s):  
N. Sergent ◽  
X. Defago ◽  
A. Schiper
2006 ◽  
Vol 5 (5) ◽  
pp. 1180-1186 ◽  
Author(s):  
H.-N. Hung ◽  
Y.-B. Lin ◽  
N.-F. Peng ◽  
S.-I. Sou

Author(s):  
Faisal Shahzad ◽  
Moritz Kreutzer ◽  
Thomas Zeiser ◽  
Rui Machado ◽  
Andreas Pieper ◽  
...  

Today’s high performance computing systems are made possible by multiple increases in hardware parallelity. This results in the decrease of mean time to failures of the systems with each newer generation, which is an alarming trend. Therefore, it is not surprising that a lot of research is going on in the area of fault tolerance and fault mitigation. Applications should survive a failure and/or be able to recover with minimal cost. We have used Global Address Space Programming Interface (GASPI), which is a relatively new communication library based on the PGAS model. It fulfills the basic requirement of a fault tolerant communication library, i.e. the failure of a process does not cause the remaining processes to fail. This work is focused on extending the fault tolerance features of GASPI in the form of a supporting health-check library that applications can benefit from. These features include failure detection, its information propagation, recovery management, communication recovery, etc. To reinforce its utility, we have also developed a fault tolerant neighbor node-level checkpoint/restart library. Instead of introducing algorithm-based fault tolerance in its true sense, we demonstrate how (using these supplementary fault tolerance functions) one can build applications to allow integrate a low cost fault detection/recovery mechanism and, if necessary, recover the application on the fly. We showcase the usage of these tools by implementing them in three different applications. Two of the applications fall in the category of linear sparse solvers, whereas the third application is based on a fluid flow solver. We also analyze the overheads involved in failure-free cases as well as various failure cases. Our fault detection mechanism causes no overhead in failure-free cases, whereas in case of failure(s), the failure detection and recovery cost is of reasonably acceptable order and shows good scalability.


2014 ◽  
Vol 11 (4) ◽  
pp. 1-12 ◽  
Author(s):  
Lin Rongheng ◽  
Wu Budan ◽  
Yang Fangchun ◽  
Zhao Yao ◽  
Hou Jinxuan

Author(s):  
D. S. Jayalakshmi ◽  
◽  
D. Hemanand ◽  
G. Muthu Kumar ◽  
M. Madhu Rani

Mobile ad-hoc network (MANET) is a network with two or more number of nodes with restricted energy constraint. The high dynamic nature in MANET attracts needs to consider the energy efficient features in their construct. The routing protocol is an important criterion to be considered for evaluating the performance of the MANET. Energy consumption plays vital role in MANET. Hence designing the scheme that supports energy efficient is much needed for the high dynamic MANET environment concerned. This paper proposes the Energy Efficient Routing (EER) protocol based on efficient route failure detection. The Scope of this paper is to suggest a fresh routing procedure for Mobile Adhoc Network minimizes unsuccessful communication. The projected procedure practices three important criterions to locate the path that assure authentic communication. The channel caliber, connection caliber and node’s residual energy is important reason for the failure of the node in MANET. Hence, the suggested routing mechanism believes these three different parameters to choose the finest node in the route. The reliable transmission and reception are attained by transferring information through route nominated by the suggested system verified by means of NS-2 simulator.


Sign in / Sign up

Export Citation Format

Share Document