process crash
Recently Published Documents


TOTAL DOCUMENTS

9
(FIVE YEARS 0)

H-INDEX

1
(FIVE YEARS 0)

2019 ◽  
Author(s):  
Rodrigo Barbieri ◽  
Enrique dos Santos ◽  
Gustavo Maciel Dias Vieira

Fault-tolerant distributed systems offer high reliability because even if faults in their components occur, they do not exhibit erroneous behavior. Depending on the fault model adopted, hardware and software errors that do not result in a process crashing are usually not tolerated. To tolerate these rather common failures the usual solution is to adopt a stronger fault model, such as the arbitrary or Byzantine fault model. Algorithms created for this fault model, however, are considerably more complex and require more system resources than the ones developed for less strict fault models. One approach to reach a middle ground is the non-malicious arbitrary fault model. In this paper we describe how we incremented an implementation of active replication in the non-malicious fault model with a basic type of distributed validation, where a deviation from the expected algorithm behavior will make a process crash. We experimentally evaluate this implementation using a fault injection framework showing that it is feasible to extend the concept of non-malicious failures beyond hardware failures.


10.29007/5vfl ◽  
2018 ◽  
Author(s):  
Ronghua Liu ◽  
Liang Guo ◽  
Yali Wang ◽  
Xiaolei Zhang ◽  
Qi Liu ◽  
...  

As floods could be effectively forecasted by distributed hydrological model, their study and application became the key points of flood forecasting and early warning. Based on high performance computing clusters, a parallel flood forecasting and warning platform with the characteristics of partition, classification, and complicated process coupled was established to forecast and warn flood across China, especially for flash flood in China. In addition, the platform was based on China Flash Flood Hydrological Model (CNFF-HM). It used files (not MPI), which based on a shared hierarchical storage system, to pass message to control the start and stop of simulation processes, and the rapid communication among simulation processes was realized; pre-allocation and dynamic allocation methods was together applied to manage the resource of the high performance computing clusters; the automatic switch among different time scale models was realized by simulation driven strategy based on rainfall events; the reboot framework was designed to deal with the process crash and delayed rainfall data. The effectiveness and stability of the platform has been tested by the flood events of 2017. Finally, a case of Weishui catchment in Hunan Province was shown.


Sign in / Sign up

Export Citation Format

Share Document