scholarly journals Much ADO about failures: a fault-aware model for compositional verification of strongly consistent distributed systems

2021 ◽  
Vol 5 (OOPSLA) ◽  
pp. 1-31
Author(s):  
Wolf Honoré ◽  
Jieung Kim ◽  
Ji-Yong Shin ◽  
Zhong Shao

Despite recent advances, guaranteeing the correctness of large-scale distributed applications without compromising performance remains a challenging problem. Network and node failures are inevitable and, for some applications, careful control over how they are handled is essential. Unfortunately, existing approaches either completely hide these failures behind an atomic state machine replication (SMR) interface, or expose all of the network-level details, sacrificing atomicity. We propose a novel, compositional, atomic distributed object (ADO) model for strongly consistent distributed systems that combines the best of both options. The object-oriented API abstracts over protocol-specific details and decouples high-level correctness reasoning from implementation choices. At the same time, it intentionally exposes an abstract view of certain key distributed failure cases, thus allowing for more fine-grained control over them than SMR-like models. We demonstrate that proving properties even of composite distributed systems can be straightforward with our Coq verification framework, Advert, thanks to the ADO model. We also show that a variety of common protocols including multi-Paxos and Chain Replication refine the ADO semantics, which allows one to freely choose among them for an application's implementation without modifying ADO-level correctness proofs.

2012 ◽  
pp. 201-222
Author(s):  
Yujian Fu ◽  
Zhijang Dong ◽  
Xudong He

The approach aims at solving the above problems by including the analysis and verification of two different levels of software development process–design level and implementation level-and bridging the gap between software architecture analysis and verification and the software product. In the architecture design level, to make sure the design correctness and attack the large scale of complex systems, the compositional verification is used by dividing and verifying each component individually and synthesizing them based on the driving theory. Then for those properties that cannot be verified on the design level, the design model is translated to implementation and runtime verification technique is adapted to the program. This approach can highly reduce the work on the design verification and avoid the state-explosion problem using model checking. Moreover, this approach can ensure both design and implementation correctness, and can further provide a high confident final software product. This approach is based on Software Architecture Model (SAM) that was proposed by Florida International University in 1999. SAM is a formal specification and built on the pair of component-connector with two formalisms – Petri nets and temporal logic. The ACV approach places strong demands on an organization to articulate those quality attributes of primary importance. It also requires a selection of benchmark combination points with which to verify integrated properties. The purpose of the ACV is not to commend particular architectures, but to provide a method for verification and analysis of large scale software systems in architecture level. The future research works fall in two directions. In the compositional verification of SAM model, it is possible that there is circular waiting of certain data among different component and connectors. This problem was not discussed in the current work. The translation of SAM to implementation is based on the restricted Petri nets due to the undecidable issue of high level Petri nets. In the runtime analysis of implementation, extraction of the execution trace of the program is still needed to get a white box view, and further analysis of execution can provide more information of the product correctness.


Author(s):  
Valentin Cristea ◽  
Ciprian Dobre ◽  
Corina Stratan ◽  
Florin Pop

Security in distributed systems is a combination of confidentiality, integrity and availability of their components. It mainly targets the communication channels between users and/or processes located in different computers, the access control of users / processes to resources and services, and the management of keys, users and user groups. Distributed systems are more vulnerable to security threats due to several characteristics such as their large scale, the distributed nature of the control, and the remote nature of the access. In addition, an increasing number of distributed applications (such as Internet banking) manipulate sensitive information and have special security requirements. After discussing important security concepts in the Background section, this chapter addresses several important problems that are at the aim of current research in the security of large scale distributed systems: security models (which represent the theoretical foundation for solving security problems), access control (more specific the access control in distributed multi-organizational platforms), secure communication (with emphasis on the secure group communication, which is a hot topic in security research today), security management (especially key management for collaborative environments), secure distributed architectures (which are the blueprints for designing and building security systems), and security environments / frameworks.


2010 ◽  
Vol 638-642 ◽  
pp. 3123-3127
Author(s):  
V.A. Malyshevsky ◽  
E.I. Khlusova ◽  
V.V. Orlov

Metallurgical industry can be considered as a field most accommodated for perception of nano-technologies, which in the near future will be able to provide large scale production and high level of investments return. Specially noted should physical and mechanical properties of nano-structured steels and alloys (strength, plasticity, toughness and so on) which will cardinally excel characteristics of respective materials developed using conventional technologies. Investigations have shown that basic principles of selection of a structure up to nano-level for low-carbon low-alloy steels can be put forward, that is: 1) morphological similarity of structural components, pre-domination of globular type structures due to reduction in carbon components and rational alloying; 2) formation of fine-dispersed carbide phase of globular morphology; 3) exclusion of lengthy interphase boundaries; 4) formation of fragmented structure with boundaries close to wide-angle ones, which inherited structure of fine-grained deformed austenite.


Author(s):  
Valentin Cristea ◽  
Ciprian Dobre ◽  
Corina Stratan ◽  
Florin Pop

The domains of usage of large scale distributed systems have been extending during the past years from scientific to commercial applications. Together with the extension of the application domains, new requirements have emerged for large scale distributed systems. Among these requirements, fault tolerance is needed by more and more modern distributed applications, not only by the critical ones. In this chapter we analyze current existing work in enabling fault tolerance in case of large scale distributed systems, presenting specific problem, existing solution, as well as several future trends. The characteristics of these systems pose problems to ensuring fault tolerance especially because of their complexity, involving many resources and users geographically distributed, because of the volatility of resources that are available only for limited amounts of time, and because of the constraints imposed by the applications and resource owners. A general fault tolerant architecture should, at a minimum, be comprised of at least a mechanism to detect failures and a component capable to recover and handle the detected failures, usually using some form of a replication mechanism. In this chapter we analyzed existing fault tolerance implementations, as well as solutions adopted in real world large scale distributed systems. We analyzed the fault tolerance architectures being proposed for particular distributed architectures, such as Grid or P2P systems.


2007 ◽  
Vol 08 (02) ◽  
pp. 163-178 ◽  
Author(s):  
FATOS XHAFA ◽  
JAVIER CARRETERO ◽  
LEONARD BAROLLI ◽  
ARJAN DURRESI

In this paper we present a study on the requirements for the design and implementation of simulation packages for Grid systems. Grids are emerging as new distributed computing systems whose main objective is to manage and allocate geographically distributed computing resources to applications and users in an efficient and transparent manner. Grid systems are at present very difficult and complex to use for experimental studies of large-scale distributed applications. Although the field of simulation of distributed computing systems is mature, recent developments in large-scale distributed systems are raising needs not present in the simulation of the traditional distributed systems. Motivated by this, we present in this work a set of basic requirements that any simulation package for Grid computing should offer. This set of functionalities is obtained after a careful review of most important existing Grid simulation packages and includes new requirements not considered in such simulation packages. Based on the identified set of requirements, a Grid simulator is developed and exemplified for the Grid scheduling problem.


Author(s):  
Florin Pop

This chapter presents a fault tolerant framework for the applications scheduling in large scale distributed systems (LSDS). Due to the specific characteristics and requirements of distributed systems, a good scheduling model should be dynamic. More specifically, it should adapt the scheduling decisions to resource state changes, which are commonly captured through monitoring. The scheduler and the monitor are two important middleware pieces that correlate their actions to ensure the high performance execution of distributed applications. The chapter presents and analyses agent based architecture for scheduling in large scale distributed systems. Then the user and resources management are presented. Optimization schemes for scheduling consider the near-optimal algorithm for distributed scheduling. The chapter presents the solution for scheduling optimization. The chapter covers and explains the fault tolerance cases for Grid environments and describes two possible scenarios for scheduling system.


2010 ◽  
Vol 20 (5-6) ◽  
pp. 537-576 ◽  
Author(s):  
MATTHEW FLUET ◽  
MIKE RAINEY ◽  
JOHN REPPY ◽  
ADAM SHAW

AbstractThe increasing availability of commodity multicore processors is making parallel computing ever more widespread. In order to exploit its potential, programmers need languages that make the benefits of parallelism accessible and understandable. Previous parallel languages have traditionally been intended for large-scale scientific computing, and they tend not to be well suited to programming the applications one typically finds on a desktop system. Thus, we need new parallel-language designs that address a broader spectrum of applications. The Manticore project is our effort to address this need. At its core is Parallel ML, a high-level functional language for programming parallel applications on commodity multicore hardware. Parallel ML provides a diverse collection of parallel constructs for different granularities of work. In this paper, we focus on the implicitly threaded parallel constructs of the language, which support fine-grained parallelism. We concentrate on those elements that distinguish our design from related ones, namely, a novel parallel binding form, a nondeterministic parallel case form, and the treatment of exceptions in the presence of data parallelism. These features differentiate the present work from related work on functional data-parallel language designs, which have focused largely on parallel problems with regular structure and the compiler transformations—most notably, flattening—that make such designs feasible. We present detailed examples utilizing various mechanisms of the language and give a formal description of our implementation.


Author(s):  
Yujian Fu ◽  
Zhijang Dong ◽  
Xudong He

The approach aims at solving the above problems by including the analysis and verification of two different levels of software development process–design level and implementation level-and bridging the gap between software architecture analysis and verification and the software product. In the architecture design level, to make sure the design correctness and attack the large scale of complex systems, the compositional verification is used by dividing and verifying each component individually and synthesizing them based on the driving theory. Then for those properties that cannot be verified on the design level, the design model is translated to implementation and runtime verification technique is adapted to the program. This approach can highly reduce the work on the design verification and avoid the state-explosion problem using model checking. Moreover, this approach can ensure both design and implementation correctness, and can further provide a high confident final software product. This approach is based on Software Architecture Model (SAM) that was proposed by Florida International University in 1999. SAM is a formal specification and built on the pair of component-connector with two formalisms – Petri nets and temporal logic. The ACV approach places strong demands on an organization to articulate those quality attributes of primary importance. It also requires a selection of benchmark combination points with which to verify integrated properties. The purpose of the ACV is not to commend particular architectures, but to provide a method for verification and analysis of large scale software systems in architecture level. The future research works fall in two directions. In the compositional verification of SAM model, it is possible that there is circular waiting of certain data among different component and connectors. This problem was not discussed in the current work. The translation of SAM to implementation is based on the restricted Petri nets due to the undecidable issue of high level Petri nets. In the runtime analysis of implementation, extraction of the execution trace of the program is still needed to get a white box view, and further analysis of execution can provide more information of the product correctness.


Author(s):  
Ciprian Dobre

The field of modeling and simulation was long seen as a viable alternative to develop new algorithms and technologies and to enable the development of large-scale distributed systems, where analytical validations are prohibited by the nature of the encountered problems. The use of discrete-event simulators in the design and development of large scale distributed systems is appealing due to their efficiency and scalability. In this chapter we focus on the challenge to enable scalable, high-level, online simulation of applications, middleware, resources and networks to support scientific and systematic study of Grid and P2P applications and environments. We describe alternatives to designing and implementing simulators to be used in the validation of distributed systems, particularly Grid and P2Ps.


Sign in / Sign up

Export Citation Format

Share Document