process crash Latest Research Papers

Fault-tolerant distributed systems offer high reliability because even if faults in their components occur, they do not exhibit erroneous behavior. Depending on the fault model adopted, hardware and software errors that do not result in a process crashing are usually not tolerated. To tolerate these rather common failures the usual solution is to adopt a stronger fault model, such as the arbitrary or Byzantine fault model. Algorithms created for this fault model, however, are considerably more complex and require more system resources than the ones developed for less strict fault models. One approach to reach a middle ground is the non-malicious arbitrary fault model. In this paper we describe how we incremented an implementation of active replication in the non-malicious fault model with a basic type of distributed validation, where a deviation from the expected algorithm behavior will make a process crash. We experimentally evaluate this implementation using a fault injection framework showing that it is feasible to extend the concept of non-malicious failures beyond hardware failures.

Download Full-text

A Parallel Flood Forecasting and Warning Platform Based on HPC Clusters

10.29007/5vfl ◽

2018 ◽

Cited By ~ 1

Author(s):

Ronghua Liu ◽

Liang Guo ◽

Yali Wang ◽

Xiaolei Zhang ◽

Qi Liu ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

Hydrological Model ◽

Flash Flood ◽

Flood Forecasting ◽

Hunan Province ◽

Dynamic Allocation ◽

Process Crash ◽

Forecasting And Warning ◽

Performance Computing

As floods could be effectively forecasted by distributed hydrological model, their study and application became the key points of flood forecasting and early warning. Based on high performance computing clusters, a parallel flood forecasting and warning platform with the characteristics of partition, classification, and complicated process coupled was established to forecast and warn flood across China, especially for flash flood in China. In addition, the platform was based on China Flash Flood Hydrological Model (CNFF-HM). It used files (not MPI), which based on a shared hierarchical storage system, to pass message to control the start and stop of simulation processes, and the rapid communication among simulation processes was realized; pre-allocation and dynamic allocation methods was together applied to manage the resource of the high performance computing clusters; the automatic switch among different time scale models was realized by simulation driven strategy based on rainfall events; the reboot framework was designed to deal with the process crash and delayed rainfall data. The effectiveness and stability of the platform has been tested by the flood events of 2017. Finally, a case of Weishui catchment in Hunan Province was shown.

Download Full-text

Reliable Broadcast in the Presence of Process Crash Failures

Fault-Tolerant Message-Passing Distributed Systems ◽

10.1007/978-3-319-94141-7_2 ◽

2018 ◽

pp. 23-40

Author(s):

Michel Raynal

Keyword(s):

Process Crash ◽

Crash Failures ◽

Reliable Broadcast

Download Full-text

Consensus and Interactive Consistency in Synchronous Systems Prone to Process Crash Failures

Fault-Tolerant Message-Passing Distributed Systems ◽

10.1007/978-3-319-94141-7_10 ◽

2018 ◽

pp. 173-187

Author(s):

Michel Raynal

Keyword(s):

Synchronous Systems ◽

Process Crash ◽

Crash Failures

Download Full-text

Non-blocking Atomic Commitment in the Presence of Process Crash Failures

Fault-Tolerant Message-Passing Distributed Systems ◽

10.1007/978-3-319-94141-7_13 ◽

2018 ◽

pp. 231-244

Author(s):

Michel Raynal

Keyword(s):

Process Crash ◽

Crash Failures ◽

Atomic Commitment

Download Full-text

Implementing Oracles in Asynchronous Systems Prone to Process Crash Failures

Fault-Tolerant Message-Passing Distributed Systems ◽

10.1007/978-3-319-94141-7_18 ◽

2018 ◽

pp. 353-383

Author(s):

Michel Raynal

Keyword(s):

Asynchronous Systems ◽

Process Crash ◽

Crash Failures

Download Full-text

Expediting Decision in Synchronous Systems Prone to Process Crash Failures

Fault-Tolerant Message-Passing Distributed Systems ◽

10.1007/978-3-319-94141-7_11 ◽

2018 ◽

pp. 189-213

Author(s):

Michel Raynal

Keyword(s):

Synchronous Systems ◽

Process Crash ◽

Crash Failures

Download Full-text

Transaction-Based Process Crash Recovery of File System Namespace Modules

2013 IEEE 19th Pacific Rim International Symposium on Dependable Computing ◽

10.1109/prdc.2013.56 ◽

2013 ◽

Cited By ~ 1

Author(s):

David C. Van Moolenbroek ◽

Raja Appuswamy ◽

Andrew S. Tanenbaum

Keyword(s):

File System ◽

Process Crash ◽

Crash Recovery

Download Full-text

Integrated System and Process Crash Recovery in the Loris Storage Stack

2012 IEEE Seventh International Conference on Networking, Architecture, and Storage ◽

10.1109/nas.2012.5 ◽

2012 ◽

Cited By ~ 2

Author(s):

David C. van Moolenbroek ◽

Raja Appuswamy ◽

Andrew S. Tanenbaum

Keyword(s):

Integrated System ◽

Process Crash ◽

Crash Recovery

Download Full-text

process crash
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Decentralized Validation for Non-malicious Arbitrary Fault Tolerance in Paxos

A Parallel Flood Forecasting and Warning Platform Based on HPC Clusters

Reliable Broadcast in the Presence of Process Crash Failures

Consensus and Interactive Consistency in Synchronous Systems Prone to Process Crash Failures

Non-blocking Atomic Commitment in the Presence of Process Crash Failures

Implementing Oracles in Asynchronous Systems Prone to Process Crash Failures

Expediting Decision in Synchronous Systems Prone to Process Crash Failures

Transaction-Based Process Crash Recovery of File System Namespace Modules

Integrated System and Process Crash Recovery in the Loris Storage Stack

Export Citation Format

process crashRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Decentralized Validation for Non-malicious Arbitrary Fault Tolerance in Paxos

A Parallel Flood Forecasting and Warning Platform Based on HPC Clusters

Reliable Broadcast in the Presence of Process Crash Failures

Consensus and Interactive Consistency in Synchronous Systems Prone to Process Crash Failures

Non-blocking Atomic Commitment in the Presence of Process Crash Failures

Implementing Oracles in Asynchronous Systems Prone to Process Crash Failures

Expediting Decision in Synchronous Systems Prone to Process Crash Failures

Transaction-Based Process Crash Recovery of File System Namespace Modules

Integrated System and Process Crash Recovery in the Loris Storage Stack

process crash
Recently Published Documents