Architectural Resiliency in Distributed Computing

2012 ◽  
Vol 4 (4) ◽  
pp. 37-51
Author(s):  
Rao Mikkilineni

Cellular organisms have evolved to manage themselves and their interactions with their surroundings with a high degree of resiliency, efficiency and scalability. Signaling and collaboration of autonomous distributed computing elements accomplishing a common goal with optimal resource utilization are the differentiating characteristics that contribute to the computing model of cellular organisms. By introducing signaling and self-management abstractions in an autonomic computing element called Distributed Intelligent Managed Element (DIME), the authors improve the architectural resiliency, efficiency, and scaling in distributed computing systems. Described are two implementations of DIME network architecture to demonstrate auto-scaling, self-repair, dynamic performance optimization, and end to end distributed transaction management. By virtualizing a process (by converting it into a DIME) in the Linux operating system and also building a new native operating system called Parallax OS optimized for Intel-multi-core processors, which converts each core into a DIME, implications of the DIME computing model to future cloud computing services and datacenter infrastructure management practices and discuss the relationship of the DIME computing model to current discussions on Turing machines, Gödel’s theorems and a call for no less than a Kuhnian paradigm shift by some computer scientists.

2012 ◽  
Vol 2012 ◽  
pp. 1-10 ◽  
Author(s):  
Rao Mikkilineni ◽  
Giovanni Morana ◽  
Daniele Zito ◽  
Marco Di Sano

This paper describes a prototype implementing a high degree of transaction resilience in distributed software systems using a non-von Neumann computing model exploiting parallelism in computing nodes. The prototype incorporates fault, configuration, accounting, performance, and security (FCAPS) management using a signaling network overlay and allows the dynamic control of a set of distributed computing elements in a network. Each node is a computing entity endowed with self-management and signaling capabilities to collaborate with similar nodes in a network. The separation of parallel computing and management channels allows the end-to-end transaction management of computing tasks (provided by the autonomous distributed computing elements) to be implemented as network-level FCAPS management. While the new computing model is operating system agnostic, a Linux, Apache, MySQL, PHP/Perl/Python (LAMP) based services architecture is implemented in a prototype to demonstrate end-to-end transaction management with auto-scaling, self-repair, dynamic performance management and distributed transaction security assurance. The implementation is made possible by a non-von Neumann middleware library providing Linux process management through multi-threaded parallel execution of self-management and signaling abstractions. We did not use Hypervisors, Virtual machines, or layers of complex virtualization management systems in implementing this prototype.


2019 ◽  
Vol 214 ◽  
pp. 03012 ◽  
Author(s):  
Federico Stagni ◽  
Andrei Tsaregorodtsev ◽  
Christophe Haen ◽  
Philippe Charpentier ◽  
Zoltan Mathe ◽  
...  

The DIRAC project is developing interware to build and operate distributed computing systems. It provides a development framework and a rich set of services for both Workload and Data Management tasks of large scientific communities. DIRAC is adopted by a growing number of collaborations, including LHCb, Belle2, CLIC, and CTA. The LHCb experiment will be upgraded during the second long LHC shutdown (2019-2020). At restart of data taking in Run 3, the instantaneous luminosity will increase by a factor of five. The LHCb computing model also need be upgraded. Oversimplifying, this translates into the need for significantly more computing power and resources, and more storage with respect to what LHCb uses right now. The DIRAC interware will keep being the tool to handle all of LHCb distributed computing resources. Within this contribution, we highlight the ongoing and planned efforts to ensure that DIRAC will be able to provide an optimal usage of its distributed computing resources. This contribution focuses on DIRAC plans for increasing the scalability of the overall system, taking in consideration that the main requirement is keeping a running system working. This requirement translates into the need of studies and developments within the current DIRAC architecture. We believe that scalability is about traffic growth, dataset growth, and maintainability: within this contribution we address all of them, showing the technical solutions we are adopting.


10.29007/44jw ◽  
2018 ◽  
Author(s):  
Rao Mikkilineni ◽  
Albert Comparini ◽  
Giovanni Morana

Turing’s o-machine discussed in his PhD thesis can perform all of the usual operations of a Turing machine and in addition, when it is in a certain internal state, can also query an oracle for an answer to a specific question that dictates its further evolution. In his thesis, Turing said 'We shall not go any further into the nature of this oracle apart from saying that it cannot be a machine.’ There is a host of literature discussing the role of the oracle in AI, modeling brain, computing, and hyper-computing machines. In this paper, we take a broader view of the oracle machine inspired by the genetic computing model of cellular organisms and the self-organizing fractal theory. We describe a specific software architecture implementation that circumvents the halting and un-decidability problems in a process workflow computation to introduce the architectural resiliency found in cellular organisms into distributed computing machines. A DIME (Distributed Intelligent Computing Element), recently introduced as the building block of the DIME computing model, exploits the concepts from Turing’s oracle machine and extends them to implement a recursive managed distributed computing network, which can be viewed as an interconnected group of such specialized oracle machines, referred to as a DIME network. The DIME network architecture provides the architectural resiliency through auto-failover; auto-scaling; live-migration; and end-to-end transaction security assurance in a distributed system. We demonstrate these characteristics using prototypes without the complexity introduced by hypervisors, virtual machines and other layers of ad-hoc management software in today’s distributed computing environments.


Author(s):  
Tengjiao Lin ◽  
Daokun Xie ◽  
Ziran Tan ◽  
Bo Liu

The aim of this paper is to investigate the influence of structure parameters on the vibration characteristics and improve the dynamic performance of marine gearbox. A finite element model was established to solve the dynamic response by using modal superposition method. Based on the theory of multi-objective optimization design, the structure sensitivity analysis model of marine gearbox was established, which takes the structure parameters of the housing as design variables. The modal and response sensitivity was obtained by using the optimal gradient method. According to the results of sensitivity analysis, a modal and response optimization model of marine gearbox was established. The objective was to avoid natural frequencies from the excitation frequencies and minimize the root mean square of vibration acceleration of the evaluating points on the surface of housing. Then the modal optimization and response optimization of gearbox were carried out by using zero-order and first-order optimization method. The results indicate that the dynamic optimization of the gearbox can be achieved. After optimization, the amplitude of vibration acceleration of the evaluating points on the housing surface has been reduced and the resonance of marine gearbox can be avoided.


Sign in / Sign up

Export Citation Format

Share Document