Formal verification for fault-tolerant architectures: Some lessons learned

Formal Verification of Fault-Tolerant Startup Algorithms for Time-Triggered Architectures: A Survey

Proceedings of the IEEE ◽

10.1109/jproc.2016.2519247 ◽

2016 ◽

Vol 104 (5) ◽

pp. 904-922 ◽

Cited By ~ 7

Author(s):

Indranil Saha ◽

Suman Roy ◽

S. Ramesh

Keyword(s):

Formal Verification ◽

Fault Tolerant

Download Full-text

Systematic formal verification for fault-tolerant time-triggered algorithms

IEEE Transactions on Software Engineering ◽

10.1109/32.815324 ◽

1999 ◽

Vol 25 (5) ◽

pp. 651-660 ◽

Cited By ~ 49

Author(s):

J. Rushby

Keyword(s):

Formal Verification ◽

Fault Tolerant

Download Full-text

Formal Verification of Fault-Tolerant and Recovery Mechanisms for Safe Node Sequence Protocol

2014 IEEE 28th International Conference on Advanced Information Networking and Applications ◽

10.1109/aina.2014.99 ◽

2014 ◽

Cited By ~ 3

Author(s):

Rui Zhou ◽

Rong Min ◽

Qi Yu ◽

Chanjuan Li ◽

Yong Sheng ◽

...

Keyword(s):

Formal Verification ◽

Fault Tolerant ◽

Recovery Mechanisms ◽

Sequence Protocol

Download Full-text

Verification of HotStuff BFT Consensus Protocol With TLA+/TLC in an Industrial Setting

SHS Web of Conferences ◽

10.1051/shsconf/20219301006 ◽

2021 ◽

Vol 93 ◽

pp. 01006

Author(s):

Vladimir Kukharenko ◽

Kirill Ziborov ◽

Rafael Sadykov ◽

Ruslan Rezin

Keyword(s):

Formal Verification ◽

Formal Specification ◽

Fault Tolerant ◽

Software Implementation ◽

Actual Behavior ◽

Smart Contracts ◽

Consensus Protocol ◽

Verification Methods ◽

Formal Specification And Verification ◽

Specification And Verification

The extent of formal verification methods applied in industrial projects has always been limited. The proliferation of distributed ledger systems (DLS), also known as blockchain, is rapidly changing the situation. Since the main area of DLSs’ application is the automation of financial transactions, the properties of predictability and reliability are critical for implementing such systems. The actual behavior of the DLS is largely determined by the chosen consensus protocol, which properties require strict specification and formal verification. Formal specification and verification of the consensus protocol is necessary but not sufficient. It is also required to ensure that the software implementation of the DLS nodes complies with this protocol. Finally, the verified software implementation of the protocol must run on a fairly reliable operating system. The financial focus of DLS application has also led to the emergence of the so-called smart contracts, which are an important part of the applied implementations of specific business processes based on DLSs. Therefore, the verifiability of smart contracts is also a critical requirement for industrial DLSs. In this paper, we describe an ongoing industrial project between a large Russian airline and three universities – Innopolis University (IU), Moscow Institute of Physics and Technology (MIPT) and Lomonosov Moscow State University (MSU). The main expected project result is a DLS for more flexible refueling of aircrafts, verified at least at the four technological levels described above. After brief project overview, we focus on our experience with the formal specification and verification of HotStuff, a leader-based fault-tolerant protocol that ensures reaching distributed consensus in the presence of Byzantine processes. The formal specification of the protocol is performed in the TLA+ language and then verified with a specialized TLC tool to verify models based on TLA+ specifications.

Download Full-text

Fault-Tolerant Control Strategy for Steering Failures in Wheeled Planetary Rovers

Journal of Robotics ◽

10.1155/2012/694673 ◽

2012 ◽

Vol 2012 ◽

pp. 1-15

Author(s):

Alexandre Carvalho Leite ◽

Bernd Schäfer ◽

Marcelo Lopes de OLiveira e Souza

Keyword(s):

Design Process ◽

Impact Analysis ◽

Fault Tolerant ◽

Fault Tolerant Control ◽

Lessons Learned ◽

The Real ◽

Driving Mode ◽

Planetary Rovers ◽

Step Procedure ◽

Nominal Mode

Fault-tolerant control design of wheeled planetary rovers is described. This paper covers all steps of the design process, from modeling/simulation to experimentation. A simplified contact model is used with a multibody simulation model and tuned to fit the experimental data. The nominal mode controller is designed to be stable and has its parameters optimized to improve tracking performance and cope with physical boundaries and actuator saturations. This controller was implemented in the real rover and validated experimentally. An impact analysis defines the repertory of faults to be handled. Failures in steering joints are chosen as fault modes; they combined six fault modes and a total of 63 possible configurations of these faults. The fault-tolerant controller is designed as a two-step procedure to provide alternative steering and reuse the nominal controller in a way that resembles a crab-like driving mode. Three fault modes are injected (one, two, and three failed steering joints) in the real rover to evaluate the response of the nonreconfigured and reconfigured control systems in face of these faults. The experimental results justify our proposed fault-tolerant controller very satisfactorily. Additional concluding comments and an outlook summarize the lessons learned during the whole design process and foresee the next steps of the research.

Download Full-text

Formal verification of a fault tolerant computer

[1992] Proceedings IEEE/AIAA 11th Digital Avionics Systems Conference ◽

10.1109/dasc.1992.282170 ◽

2003 ◽

Cited By ~ 5

Author(s):

N.A. Brock ◽

D.M. Jackson

Keyword(s):

Formal Verification ◽

Fault Tolerant

Download Full-text

MAKING ICS WORK FOR YOU WHEN IT REALLY COUNTS! – HARD LESSONS LEARNED ON THE SHORES OF CALIFORNIA

International Oil Spill Conference Proceedings ◽

10.7901/2169-3358-2005-1-741 ◽

2005 ◽

Vol 2005 (1) ◽

pp. 741-745

Author(s):

Carl Jochums ◽

William Robberson

Keyword(s):

Fault Tolerant ◽

Response Speed ◽

Lessons Learned ◽

Incident Command System ◽

Incident Command ◽

Set Up ◽

The Moment ◽

National Significance ◽

The Right ◽

Message Mapping

ABSTRACT The moment an oil spill occurs, response speed is of the essence. Yet how often have you participated in the Incident Command System (ICS) at a spill and been frustrated with the speed or the coordination of the response? How often has a response been declared a success but getting there was so frustrating and exhausting that you've sworn you won't work that way anymore? ICS is here to stay; yet how you can consistently make it work optimally for you and the response remains a challenge. This paper is based upon the premise that ICS enables the right information to be communicated to the right people, in the right format, at the right time. However, during most response debriefs, at the top of the “needs improvement” list you will find numerous references to the failure of information flow and communications. In this paper we share some of the hard lessons-learned in spill response along the California Coast, and ways in which some of the agencies involved today are proactively “preparing to communicate” within the Incident Command System. We use case histories of past and recent spills and the California Spill of National Significance 2004 exercise to illustrate the communications and coordination problems inherent in most response Incident Command structures. A variety of issues are considered; from the evolutionary paths of most responses; to the numerous personalities and egos involved; to the wide array of expectations amongst participants and stakeholders; and the often unique and varied authorities and agendas that multiple agencies bring to a response. We also suggest innovative ways in which the process of communications within the ICS is being augmented, enhanced, and set-up for success. We introduce concepts such as “data mining,” “embedded information specialists,” “fault-tolerant” communications mechanisms, “message mapping,” and “NEBA front-end loading”. A number of communications tools and concepts are described, that if implemented, will greatly improve multi-agency coordination and communications during a response, leading to a less stressful and more successful response outcome.

Download Full-text

Integrated Flight/Propulsion Control for Flight Critical Applications: A Propulsion System Perspective

Journal of Engineering for Gas Turbines and Power ◽

10.1115/1.2906653 ◽

1992 ◽

Vol 114 (4) ◽

pp. 755-762 ◽

Cited By ~ 2

Author(s):

K. D. Tillman ◽

T. J. Ikeler

Keyword(s):

Control System ◽

Real Time ◽

Air Force ◽

Flight Control ◽

Closed Loop ◽

Fault Tolerant ◽

Lessons Learned ◽

System Perspective ◽

Propulsion Control ◽

Development Center

The Pratt & Whitney and Northrop companies together, under the Air Force Wright Research and Development Center (WRDC) sponsored Integrated Reliable Fault-Tolerant Control for Large Engines (INTERFACE II) Program [1, 2], designed and demonstrated an advanced real-time Integrated Flight and Propulsion Control (IFPC) system. This IFPC system was based upon the development of physically distinctive, functionally integrated, flight and propulsion controls that managed the Northrop twin engine, statically unstable, P700 airplane. Digital flight control and digital engine control hardware were combined with cockpit control hardware and computer simulations of the airplane and engines to provide a real-time, closed-loop, piloted IFPC system. As part of a follow-on effort, lessons learned during the INTERFACE II program are being applied to the design of a flight critical propulsion control system. This paper will present both the results of the INTERFACE II IFPC program and approaches toward definition and development of an integrated propulsion control system for flight critical applications.

Download Full-text