scholarly journals A multiparty session typing discipline for fault-tolerant event-driven distributed programming

2021 ◽  
Vol 5 (OOPSLA) ◽  
pp. 1-30
Author(s):  
Malte Viering ◽  
Raymond Hu ◽  
Patrick Eugster ◽  
Lukasz Ziarek

This paper presents a formulation of multiparty session types (MPSTs) for practical fault-tolerant distributed programming. We tackle the challenges faced by session types in the context of distributed systems involving asynchronous and concurrent partial failures – such as supporting dynamic replacement of failed parties and retrying failed protocol segments in an ongoing multiparty session – in the presence of unreliable failure detection. Key to our approach is that we develop a novel model of event-driven concurrency for multiparty sessions. Inspired by real-world practices, it enables us to unify the session-typed handling of regular I/O events with failure handling and the combination of features needed to express practical fault-tolerant protocols. Moreover, the characteristics of our model allow us to prove a global progress property for well-typed processes engaged in multiple concurrent sessions, which does not hold in traditional MPST systems. To demonstrate its practicality, we implement our framework as a toolchain and runtime for Scala, and use it to specify and implement a session-typed version of the cluster management system of the industrial-strength Apache Spark data analytics framework. Our session-typed cluster manager composes with other vanilla Spark components to give a functioning Spark runtime; e.g., it can execute existing third-party Spark applications without code modification. A performance evaluation using the TPC-H benchmark shows our prototype implementation incurs an average overhead below 10%.

2021 ◽  
Author(s):  
James Gaston

The work area of a team of small robots is limited by their inability to traverse a very common obstacle: stairs. We present a complete integrated control architecture and communication strategy for a system of reconfigurable robots that can climb stairs. A modular robot design is presented which allows the robots to dynamically reconfigure to traverse certain obstacles. This thesis investigates the implementation of a system of autonomous robots which can cooperatively reconfigure themselves to collectively travers obstacle such as stairs. We present a complete behaviorand communication system which facilitates this autonomous reconfiguration. The layered behavior-based control system is fault-tolerant and extends the capabilities of a control architecture known as ALLIANCE. Behavior classes are introduced as mechanism for managing ordering dependencies and monitoring a robot's progress through a particular task. The communication system compliments the behavioral control and iimplementsinherent robot failure detection without the need for a base station or external monitor. The behavior and communication systems are validated by implementing them ona mobile robot platform synthesized specifically for this research. Experimental trials showed that the implementation of the behavior control systems was successful. The control system provided robust, fault-tolerant performance even when robots failed to perform docking tasks while recongifuring. Once the robots reconfigure to form a chain, a different control scheme based on gait control tables coordinates the individual movements of the robots. Several successful stair climbing trials were accomplished. Improvements to the mechanical design are proposed.


2021 ◽  
pp. 1-30
Author(s):  
İ. Gümüşboğa ◽  
A. İftar

Abstract Elevator failure may have fatal consequences for fighter aircraft that are unstable due to their high manoeuvrability requirements. Many studies have been conducted in the literature using active and passive fault-tolerant control structures. However, these studies mostly include sophisticated controllers with high computational load that cannot work in real systems. Considering the multi-functionality and broad operational prospects of fighter aircraft, computational load is very important in terms of applicability. In this study, an integrated fault-tolerant control strategy with low computational load is proposed without sacrificing the ability to cope with failures. This control strategy switches between predetermined controllers in the case of failure. One of these controllers is designed to operate in a non-failure condition. This controller is a basic controller that requires very little computational effort. The other controller operates when an asymmetric elevator failure occurs. This controller is a robust fault-tolerant controller that can fly the aircraft safely in case of elevator failure. The switching is decided by a failure detection system. The proposed integrated fault-tolerant control system is verified by non-linear F-16 flight simulations. These simulations show that the proposed method can cope with failures but requires less computational load because it uses a conventional controller in the case of no failure.


2016 ◽  
Vol 7 (3) ◽  
pp. 86-98 ◽  
Author(s):  
Mohammed A. AlZain ◽  
Alice S. Li ◽  
Ben Soh ◽  
Mehedi Masud

One of the main challenges in cloud computing is to build a healthy and efficient storage for securely managing and preserving data. This means a cloud service provider needs to make sure that its clients' outsourced data are stored securely and, data queries and retrievals are executed correctly and privately. On the other hand, it may also mean businesses are willing to outsource their data to a third party only if they trust their data are not accessible and visible to the service provider and other non-authorized parties. However, one of the major obstacles faced here for ensuring data reliability and security is Byzantine faults. While Byzantine fault tolerance (BFT) has received growing attention from the academic research community, the research done is generally from the distributed computing point of view, and hence finds little practical use in cloud computing. To that end, the focus of this paper is to discuss how these faults can be tolerated with the authors' proposed conceptualization of Byzantine data faults and fault-tolerant architecture in cloud data management.


Author(s):  
Vincenzo De Florio

The programming language itself is the focus of this chapter: Fault-tolerance is not embedded in the program (as it is the case e.g. for single-version fault-tolerance), nor around the language (through compilers or translators); on the contrary, faulttolerance is provided through the syntactical structures and the run-time executives of fault-tolerance programming languages. Also in this case a significant part of the complexity of dependability enforcement is moved from each single code to the architecture, in this case the programming language. Many cases exist of fault-tolerance programming languages; this chapter proposes a few of them, considering three cases: Object-oriented languages, functional languages, and hybrid languages. In particular it is discussed the case of Oz, a multi-paradigm programming language that achieves both transparent distribution and translucent failure handling.


2020 ◽  
Vol 77 ◽  
pp. 04003
Author(s):  
Mark Ogbodo ◽  
Khanh Dang ◽  
Fukuchi Tomohide ◽  
Abderazek Abdallah

Neuromorphic computing tries to model in hardware the biological brain which is adept at operating in a rapid, real-time, parallel, low power, adaptive and fault-tolerant manner within a volume of 2 liters. Leveraging the event driven nature of Spiking Neural Network (SNN), neuromorphic systems have been able to demonstrate low power consumption by power gating sections of the network not driven by an event at any point in time. However, further exploration in this field towards the building of edge application friendly agents and efficient scalable neuromorphic systems with large number of synapses necessitates the building of small-sized low power spiking neuron processor core with efficient neuro-coding scheme and fault tolerance. This paper presents a spiking neuron processor core suitable for an event-driven Three-Dimensional Network on Chip (3D-NoC) SNN based neuromorphic systems. The spiking neuron Processor core houses an array of leaky integrate and fire (LIF) neurons, and utilizes a crossbar memory in modelling the synapses, all within a chip area of 0.12mm2 and was able to achieves an accuracy of 95.15% on MNIST dataset inference.


2008 ◽  
Vol 18 (03) ◽  
pp. 411-432 ◽  
Author(s):  
BORIS MEJÍAS ◽  
PETER VAN ROY

Fault-tolerance and lookup consistency are considered crucial properties for building applications on top of structured overlay networks. Many of these networks use the ring topology for the organization or their peers. The network must handle multiple joins, leaves and failures of peers while keeping the connection between every pair of successor-predecessor correct. This property makes the maintenance of the ring very costly and temporarily impossible to achieve, requiring periodic stabilization for fixing the ring. We introduce the relaxed-ring topology that does not rely on a perfect successor-predecessor relationship and it does not need a any periodic maintenance. Leaves and failures are considered as the same type of event providing a fault-tolerant and self-organizing maintenance of the ring. Relaxed-ring's limitations with respect to failure handling are formally identified, providing strong guarantees to develop applications on top of the architecture. Besides permanent failures, the paper analyses temporary failures and false suspicions caused by broken links, which are often ignored.


Sign in / Sign up

Export Citation Format

Share Document