A multiparty session typing discipline for fault-tolerant event-driven distributed programming

Malte Viering; Raymond Hu; Patrick Eugster; Lukasz Ziarek

doi:10.1145/3485501

A multiparty session typing discipline for fault-tolerant event-driven distributed programming

Proceedings of the ACM on Programming Languages ◽

10.1145/3485501 ◽

2021 ◽

Vol 5 (OOPSLA) ◽

pp. 1-30

Author(s):

Malte Viering ◽

Raymond Hu ◽

Patrick Eugster ◽

Lukasz Ziarek

Keyword(s):

Fault Tolerant ◽

Failure Detection ◽

Third Party ◽

Distributed Programming ◽

Failure Handling ◽

Session Types ◽

Dynamic Replacement ◽

Event Driven ◽

Industrial Strength ◽

Novel Model

This paper presents a formulation of multiparty session types (MPSTs) for practical fault-tolerant distributed programming. We tackle the challenges faced by session types in the context of distributed systems involving asynchronous and concurrent partial failures – such as supporting dynamic replacement of failed parties and retrying failed protocol segments in an ongoing multiparty session – in the presence of unreliable failure detection. Key to our approach is that we develop a novel model of event-driven concurrency for multiparty sessions. Inspired by real-world practices, it enables us to unify the session-typed handling of regular I/O events with failure handling and the combination of features needed to express practical fault-tolerant protocols. Moreover, the characteristics of our model allow us to prove a global progress property for well-typed processes engaged in multiple concurrent sessions, which does not hold in traditional MPST systems. To demonstrate its practicality, we implement our framework as a toolchain and runtime for Scala, and use it to specify and implement a session-typed version of the cluster management system of the industrial-strength Apache Spark data analytics framework. Our session-typed cluster manager composes with other vanilla Spark components to give a functioning Spark runtime; e.g., it can execute existing third-party Spark applications without code modification. A performance evaluation using the TPC-H benchmark shows our prototype implementation incurs an average overhead below 10%.

Download Full-text

Gremlins: an Architectural Framework for Reconfigurable Autonomous Robots

10.32920/ryerson.14656818 ◽

2021 ◽

Author(s):

James Gaston

Keyword(s):

Control System ◽

Communication System ◽

Communication Systems ◽

Autonomous Robots ◽

Fault Tolerant ◽

Failure Detection ◽

Base Station ◽

Communication Strategy ◽

Stair Climbing ◽

Control Architecture

The work area of a team of small robots is limited by their inability to traverse a very common obstacle: stairs. We present a complete integrated control architecture and communication strategy for a system of reconfigurable robots that can climb stairs. A modular robot design is presented which allows the robots to dynamically reconfigure to traverse certain obstacles. This thesis investigates the implementation of a system of autonomous robots which can cooperatively reconfigure themselves to collectively travers obstacle such as stairs. We present a complete behaviorand communication system which facilitates this autonomous reconfiguration. The layered behavior-based control system is fault-tolerant and extends the capabilities of a control architecture known as ALLIANCE. Behavior classes are introduced as mechanism for managing ordering dependencies and monitoring a robot's progress through a particular task. The communication system compliments the behavioral control and iimplementsinherent robot failure detection without the need for a base station or external monitor. The behavior and communication systems are validated by implementing them ona mobile robot platform synthesized specifically for this research. Experimental trials showed that the implementation of the behavior control systems was successful. The control system provided robust, fault-tolerant performance even when robots failed to perform docking tasks while recongifuring. Once the robots reconfigure to form a chain, a different control scheme based on gait control tables coordinates the individual movements of the robots. Several successful stair climbing trials were accomplished. Improvements to the mechanical design are proposed.

Download Full-text

An integrated fault-tolerant control strategy for control surface failure in a fighter aircraft

The Aeronautical Journal ◽

10.1017/aer.2021.66 ◽

2021 ◽

pp. 1-30

Author(s):

İ. Gümüşboğa ◽

A. İftar

Keyword(s):

Control Strategy ◽

Fault Tolerant ◽

Detection System ◽

Fault Tolerant Control ◽

Failure Detection ◽

Computational Effort ◽

Control Surface ◽

Fighter Aircraft ◽

Control Structures ◽

Computational Load

Abstract Elevator failure may have fatal consequences for fighter aircraft that are unstable due to their high manoeuvrability requirements. Many studies have been conducted in the literature using active and passive fault-tolerant control structures. However, these studies mostly include sophisticated controllers with high computational load that cannot work in real systems. Considering the multi-functionality and broad operational prospects of fighter aircraft, computational load is very important in terms of applicability. In this study, an integrated fault-tolerant control strategy with low computational load is proposed without sacrificing the ability to cope with failures. This control strategy switches between predetermined controllers in the case of failure. One of these controllers is designed to operate in a non-failure condition. This controller is a basic controller that requires very little computational effort. The other controller operates when an asymmetric elevator failure occurs. This controller is a robust fault-tolerant controller that can fly the aircraft safely in case of elevator failure. The switching is decided by a failure detection system. The proposed integrated fault-tolerant control system is verified by non-linear F-16 flight simulations. These simulations show that the proposed method can cope with failures but requires less computational load because it uses a conventional controller in the case of no failure.

Download Full-text

Byzantine Fault-Tolerant Architecture in Cloud Data Management

International Journal of Knowledge Society Research ◽

10.4018/ijksr.2016070106 ◽

2016 ◽

Vol 7 (3) ◽

pp. 86-98 ◽

Cited By ~ 3

Author(s):

Mohammed A. AlZain ◽

Alice S. Li ◽

Ben Soh ◽

Mehedi Masud

Keyword(s):

Cloud Computing ◽

Data Management ◽

Service Provider ◽

Fault Tolerant ◽

Academic Research ◽

Third Party ◽

Cloud Data ◽

Byzantine Faults ◽

Byzantine Fault ◽

Cloud Data Management

One of the main challenges in cloud computing is to build a healthy and efficient storage for securely managing and preserving data. This means a cloud service provider needs to make sure that its clients' outsourced data are stored securely and, data queries and retrievals are executed correctly and privately. On the other hand, it may also mean businesses are willing to outsource their data to a third party only if they trust their data are not accessible and visible to the service provider and other non-authorized parties. However, one of the major obstacles faced here for ensuring data reliability and security is Byzantine faults. While Byzantine fault tolerance (BFT) has received growing attention from the academic research community, the research done is generally from the distributed computing point of view, and hence finds little practical use in cloud computing. To that end, the focus of this paper is to discuss how these faults can be tolerated with the authors' proposed conceptualization of Byzantine data faults and fault-tolerant architecture in cloud data management.

Download Full-text

Fault-Tolerant Protocols Using Fault-Tolerance Programming Languages

Application-Layer Fault-Tolerance Protocols ◽

10.4018/978-1-60566-182-7.ch005 ◽

2009 ◽

pp. 161-174

Author(s):

Vincenzo De Florio

Keyword(s):

Fault Tolerance ◽

Programming Languages ◽

Programming Language ◽

Fault Tolerant ◽

Object Oriented ◽

Significant Part ◽

Functional Languages ◽

Failure Handling ◽

Run Time ◽

Object Oriented Languages

The programming language itself is the focus of this chapter: Fault-tolerance is not embedded in the program (as it is the case e.g. for single-version fault-tolerance), nor around the language (through compilers or translators); on the contrary, faulttolerance is provided through the syntactical structures and the run-time executives of fault-tolerance programming languages. Also in this case a significant part of the complexity of dependability enforcement is moved from each single code to the architecture, in this case the programming language. Many cases exist of fault-tolerance programming languages; this chapter proposes a few of them, considering three cases: Object-oriented languages, functional languages, and hybrid languages. In particular it is discussed the case of Oz, a multi-paradigm programming language that achieves both transparent distribution and translucent failure handling.

Download Full-text

True Event-Driven and Fault-Tolerant Routing in Wireless Sensor Network

Wireless Personal Communications ◽

10.1007/s11277-020-07037-3 ◽

2020 ◽

Vol 112 (1) ◽

pp. 439-461

Author(s):

Priyajit Biswas ◽

Tuhina Samanta

Keyword(s):

Wireless Sensor Network ◽

Sensor Network ◽

Fault Tolerant ◽

Wireless Sensor ◽

Event Driven

Download Full-text

Architecture and Design of a Spiking Neuron Processor Core Towards the Design of a Large-scale Event-Driven 3D-NoC-based Neuromorphic Processor

SHS Web of Conferences ◽

10.1051/shsconf/20207704003 ◽

2020 ◽

Vol 77 ◽

pp. 04003

Author(s):

Mark Ogbodo ◽

Khanh Dang ◽

Fukuchi Tomohide ◽

Abderazek Abdallah

Keyword(s):

Low Power ◽

Large Scale ◽

Fault Tolerant ◽

Three Dimensional ◽

Spiking Neuron ◽

Chip Area ◽

Processor Core ◽

Neuromorphic Systems ◽

Event Driven ◽

On Chip

Neuromorphic computing tries to model in hardware the biological brain which is adept at operating in a rapid, real-time, parallel, low power, adaptive and fault-tolerant manner within a volume of 2 liters. Leveraging the event driven nature of Spiking Neural Network (SNN), neuromorphic systems have been able to demonstrate low power consumption by power gating sections of the network not driven by an event at any point in time. However, further exploration in this field towards the building of edge application friendly agents and efficient scalable neuromorphic systems with large number of synapses necessitates the building of small-sized low power spiking neuron processor core with efficient neuro-coding scheme and fault tolerance. This paper presents a spiking neuron processor core suitable for an event-driven Three-Dimensional Network on Chip (3D-NoC) SNN based neuromorphic systems. The spiking neuron Processor core houses an array of leaky integrate and fire (LIF) neurons, and utilizes a crossbar memory in modelling the synapses, all within a chip area of 0.12mm2 and was able to achieves an accuracy of 95.15% on MNIST dataset inference.

Download Full-text

THE RELAXED-RING: A FAULT-TOLERANT TOPOLOGY FOR STRUCTURED OVERLAY NETWORKS

Parallel Processing Letters ◽

10.1142/s0129626408003478 ◽

2008 ◽

Vol 18 (03) ◽

pp. 411-432 ◽

Cited By ~ 5

Author(s):

BORIS MEJÍAS ◽

PETER VAN ROY

Keyword(s):

Fault Tolerance ◽

Overlay Networks ◽

Fault Tolerant ◽

Ring Topology ◽

Failure Handling ◽

Periodic Maintenance ◽

Structured Overlay ◽

Structured Overlay Networks ◽

Self Organizing

Fault-tolerance and lookup consistency are considered crucial properties for building applications on top of structured overlay networks. Many of these networks use the ring topology for the organization or their peers. The network must handle multiple joins, leaves and failures of peers while keeping the connection between every pair of successor-predecessor correct. This property makes the maintenance of the ring very costly and temporarily impossible to achieve, requiring periodic stabilization for fixing the ring. We introduce the relaxed-ring topology that does not rely on a perfect successor-predecessor relationship and it does not need a any periodic maintenance. Leaves and failures are considered as the same type of event providing a fault-tolerant and self-organizing maintenance of the ring. Relaxed-ring's limitations with respect to failure handling are formally identified, providing strong guarantees to develop applications on top of the architecture. Besides permanent failures, the paper analyses temporary failures and false suspicions caused by broken links, which are often ignored.

Download Full-text

Scalable and Fault Tolerant Failure Detection and Consensus

Proceedings of the 22nd European MPI Users' Group Meeting on ZZZ - EuroMPI '15 ◽

10.1145/2802658.2802660 ◽

2015 ◽

Cited By ~ 12

Author(s):

Amogh Katti ◽

Giuseppe Di Fatta ◽

Thomas Naughton ◽

Christian Engelmann

Keyword(s):

Fault Tolerant ◽

Failure Detection

Download Full-text

A fault tolerant three-leg shunt active filter using FPGA for fast switch failure detection

2008 IEEE Power Electronics Specialists Conference ◽

10.1109/pesc.2008.4592471 ◽

2008 ◽

Cited By ~ 6

Author(s):

Shahram Karimi ◽

Philippe Poure ◽

Shahrokh Saadate

Keyword(s):

Fault Tolerant ◽

Failure Detection ◽

Active Filter ◽

Shunt Active Filter

Download Full-text

Failure detection and identification and fault tolerant control using the IMM-KF with applications to the Eagle-Eye UAV

Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171) ◽

10.1109/cdc.1998.761963 ◽

2002 ◽

Cited By ~ 36

Author(s):

C. Rago ◽

R. Prasanth ◽

R.K. Mehra ◽

R. Fortenbaugh

Keyword(s):

Fault Tolerant ◽

Fault Tolerant Control ◽

Failure Detection ◽

Detection And Identification

Download Full-text