Architectural Resiliency in Distributed Computing

Rao Mikkilineni

doi:10.4018/jghpc.2012100103

Architectural Resiliency in Distributed Computing

International Journal of Grid and High Performance Computing ◽

10.4018/jghpc.2012100103 ◽

2012 ◽

Vol 4 (4) ◽

pp. 37-51

Author(s):

Rao Mikkilineni

Keyword(s):

Operating System ◽

Distributed Computing ◽

Performance Optimization ◽

Network Architecture ◽

Autonomic Computing ◽

Management Practices ◽

Dynamic Performance ◽

Distributed Computing Systems ◽

Computing Model ◽

Distributed Transaction

Cellular organisms have evolved to manage themselves and their interactions with their surroundings with a high degree of resiliency, efficiency and scalability. Signaling and collaboration of autonomous distributed computing elements accomplishing a common goal with optimal resource utilization are the differentiating characteristics that contribute to the computing model of cellular organisms. By introducing signaling and self-management abstractions in an autonomic computing element called Distributed Intelligent Managed Element (DIME), the authors improve the architectural resiliency, efficiency, and scaling in distributed computing systems. Described are two implementations of DIME network architecture to demonstrate auto-scaling, self-repair, dynamic performance optimization, and end to end distributed transaction management. By virtualizing a process (by converting it into a DIME) in the Linux operating system and also building a new native operating system called Parallax OS optimized for Intel-multi-core processors, which converts each core into a DIME, implications of the DIME computing model to future cloud computing services and datacenter infrastructure management practices and discuss the relationship of the DIME computing model to current discussions on Turing machines, Gödel’s theorems and a call for no less than a Kuhnian paradigm shift by some computer scientists.

Download Full-text

Service Virtualization Using a Non-von Neumann Parallel, Distributed, and Scalable Computing Model

Journal of Computer Networks and Communications ◽

10.1155/2012/604018 ◽

2012 ◽

Vol 2012 ◽

pp. 1-10 ◽

Cited By ~ 12

Author(s):

Rao Mikkilineni ◽

Giovanni Morana ◽

Daniele Zito ◽

Marco Di Sano

Keyword(s):

Distributed Computing ◽

Performance Management ◽

Virtual Machines ◽

Dynamic Performance ◽

Parallel Execution ◽

Self Management ◽

Transaction Management ◽

Von Neumann ◽

Computing Model ◽

End To End

This paper describes a prototype implementing a high degree of transaction resilience in distributed software systems using a non-von Neumann computing model exploiting parallelism in computing nodes. The prototype incorporates fault, configuration, accounting, performance, and security (FCAPS) management using a signaling network overlay and allows the dynamic control of a set of distributed computing elements in a network. Each node is a computing entity endowed with self-management and signaling capabilities to collaborate with similar nodes in a network. The separation of parallel computing and management channels allows the end-to-end transaction management of computing tasks (provided by the autonomous distributed computing elements) to be implemented as network-level FCAPS management. While the new computing model is operating system agnostic, a Linux, Apache, MySQL, PHP/Perl/Python (LAMP) based services architecture is implemented in a prototype to demonstrate end-to-end transaction management with auto-scaling, self-repair, dynamic performance management and distributed transaction security assurance. The implementation is made possible by a non-von Neumann middleware library providing Linux process management through multi-threaded parallel execution of self-management and signaling abstractions. We did not use Hypervisors, Virtual machines, or layers of complex virtualization management systems in implementing this prototype.

Download Full-text

Service Address Routing: A Network Architecture for Tightly Coupled Distributed Computing Systems

8th International Symposium on Parallel Architectures,Algorithms and Networks (ISPAN'05) ◽

10.1109/ispan.2005.80 ◽

2006 ◽

Cited By ~ 1

Author(s):

I.D. Scherson ◽

D.S. Valencia

Keyword(s):

Distributed Computing ◽

Network Architecture ◽

Distributed Computing Systems ◽

Computing Systems ◽

Tightly Coupled

Download Full-text

LHCb and DIRAC strategy towards the LHCb upgrade

EPJ Web of Conferences ◽

10.1051/epjconf/201921403012 ◽

2019 ◽

Vol 214 ◽

pp. 03012 ◽

Cited By ~ 2

Author(s):

Federico Stagni ◽

Andrei Tsaregorodtsev ◽

Christophe Haen ◽

Philippe Charpentier ◽

Zoltan Mathe ◽

...

Keyword(s):

Distributed Computing ◽

Distributed Computing Systems ◽

Computing Systems ◽

Scientific Communities ◽

Computing Power ◽

Computing Model ◽

Main Requirement ◽

Running System ◽

Development Framework ◽

Technical Solutions

The DIRAC project is developing interware to build and operate distributed computing systems. It provides a development framework and a rich set of services for both Workload and Data Management tasks of large scientific communities. DIRAC is adopted by a growing number of collaborations, including LHCb, Belle2, CLIC, and CTA. The LHCb experiment will be upgraded during the second long LHC shutdown (2019-2020). At restart of data taking in Run 3, the instantaneous luminosity will increase by a factor of five. The LHCb computing model also need be upgraded. Oversimplifying, this translates into the need for significantly more computing power and resources, and more storage with respect to what LHCb uses right now. The DIRAC interware will keep being the tool to handle all of LHCb distributed computing resources. Within this contribution, we highlight the ongoing and planned efforts to ensure that DIRAC will be able to provide an optimal usage of its distributed computing resources. This contribution focuses on DIRAC plans for increasing the scalability of the overall system, taking in consideration that the main requirement is keeping a running system working. This requirement translates into the need of studies and developments within the current DIRAC architecture. We believe that scalability is about traffic growth, dataset growth, and maintainability: within this contribution we address all of them, showing the technical solutions we are adopting.

Download Full-text

The Turing O-Machine and the DIME Network Architecture: Injecting the Architectural Resiliency into Distributed Computing

10.29007/44jw ◽

2018 ◽

Author(s):

Rao Mikkilineni ◽

Albert Comparini ◽

Giovanni Morana

Keyword(s):

Distributed Computing ◽

Network Architecture ◽

Ad Hoc ◽

Virtual Machines ◽

Internal State ◽

Fractal Theory ◽

Intelligent Computing ◽

Computing Model ◽

Transaction Security ◽

Computing Machines

Turing’s o-machine discussed in his PhD thesis can perform all of the usual operations of a Turing machine and in addition, when it is in a certain internal state, can also query an oracle for an answer to a specific question that dictates its further evolution. In his thesis, Turing said 'We shall not go any further into the nature of this oracle apart from saying that it cannot be a machine.’ There is a host of literature discussing the role of the oracle in AI, modeling brain, computing, and hyper-computing machines. In this paper, we take a broader view of the oracle machine inspired by the genetic computing model of cellular organisms and the self-organizing fractal theory. We describe a specific software architecture implementation that circumvents the halting and un-decidability problems in a process workflow computation to introduce the architectural resiliency found in cellular organisms into distributed computing machines. A DIME (Distributed Intelligent Computing Element), recently introduced as the building block of the DIME computing model, exploits the concepts from Turing’s oracle machine and extends them to implement a recursive managed distributed computing network, which can be viewed as an interconnected group of such specialized oracle machines, referred to as a DIME network. The DIME network architecture provides the architectural resiliency through auto-failover; auto-scaling; live-migration; and end-to-end transaction security assurance in a distributed system. We demonstrate these characteristics using prototypes without the complexity introduced by hypervisors, virtual machines and other layers of ad-hoc management software in today’s distributed computing environments.

Download Full-text

Efficient Resource Allocation Algorithm in Dependable Distributed Computing Systems Using A Colony Optimization

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i1.168171 ◽

2018 ◽

Vol 6 (1) ◽

pp. 168-171

Author(s):

Manas Kumar Yogi ◽

◽

G. Kumari ◽

L.Yamuna . ◽

◽

...

Keyword(s):

Resource Allocation ◽

Distributed Computing ◽

Distributed Computing Systems ◽

Computing Systems ◽

Resource Allocation Algorithm ◽

Allocation Algorithm ◽

Efficient Resource

Download Full-text

An Enhancement of Leveled DAG Prioritized Task Scheduling Algorithm in Distributed Computing Systems

Menoufia Journal of Electronic Engineering Research ◽

10.21608/mjeer.2017.63443 ◽

2017 ◽

Vol 26 (1) ◽

pp. 171-192

Author(s):

Amal EL-NATTAT ◽

Nirmeen A. El-Bahnasawy ◽

Ayman EL-SAYED

Keyword(s):

Distributed Computing ◽

Task Scheduling ◽

Scheduling Algorithm ◽

Distributed Computing Systems ◽

Computing Systems ◽

Task Scheduling Algorithm

Download Full-text

Structure Sensitivity Analysis and Dynamic Performance Optimization of Marine Gearbox

Volume 10: 2017 ASME International Power Transmission and Gearing Conference ◽

10.1115/detc2017-67072 ◽

2017 ◽

Author(s):

Tengjiao Lin ◽

Daokun Xie ◽

Ziran Tan ◽

Bo Liu

Keyword(s):

Sensitivity Analysis ◽

Performance Optimization ◽

Optimization Design ◽

Dynamic Performance ◽

Structure Sensitivity ◽

Superposition Method ◽

Structure Parameters ◽

Analysis Model ◽

Vibration Acceleration ◽

Response Optimization

The aim of this paper is to investigate the influence of structure parameters on the vibration characteristics and improve the dynamic performance of marine gearbox. A finite element model was established to solve the dynamic response by using modal superposition method. Based on the theory of multi-objective optimization design, the structure sensitivity analysis model of marine gearbox was established, which takes the structure parameters of the housing as design variables. The modal and response sensitivity was obtained by using the optimal gradient method. According to the results of sensitivity analysis, a modal and response optimization model of marine gearbox was established. The objective was to avoid natural frequencies from the excitation frequencies and minimize the root mean square of vibration acceleration of the evaluating points on the surface of housing. Then the modal optimization and response optimization of gearbox were carried out by using zero-order and first-order optimization method. The results indicate that the dynamic optimization of the gearbox can be achieved. After optimization, the amplitude of vibration acceleration of the evaluating points on the housing surface has been reduced and the resonance of marine gearbox can be avoided.

Download Full-text

Optimization procedure for algorithms of task scheduling in high performance heterogeneous distributed computing systems

Egyptian Informatics Journal ◽

10.1016/j.eij.2011.10.001 ◽

2011 ◽

Vol 12 (3) ◽

pp. 219-229 ◽

Cited By ~ 5

Author(s):

Nirmeen A. Bahnasawy ◽

Fatma Omara ◽

Magdy A. Koutb ◽

Mervat Mosa

Keyword(s):

Distributed Computing ◽

Task Scheduling ◽

High Performance ◽

Optimization Procedure ◽

Distributed Computing Systems ◽

Computing Systems ◽

Heterogeneous Distributed Computing ◽

Heterogeneous Distributed Computing Systems

Download Full-text

Middleware of real-time object based fault tolerant distributed computing systems: issues and some approaches

Proceedings 2001 Pacific Rim International Symposium on Dependable Computing ◽

10.1109/prdc.2001.992672 ◽

2002 ◽

Cited By ~ 6

Author(s):

K.H. Kim

Keyword(s):

Distributed Computing ◽

Real Time ◽

Fault Tolerant ◽

Distributed Computing Systems ◽

Computing Systems ◽

Object Based

Download Full-text

Artificial Intelligent Load Balance Agent on Network Traffic Across Multiple Heterogeneous Distributed Computing Systems

SSRN Electronic Journal ◽

10.2139/ssrn.3739322 ◽

2020 ◽

Author(s):

Anit Kumar ◽

Dhanpratap Singh

Keyword(s):

Distributed Computing ◽

Network Traffic ◽

Load Balance ◽

Distributed Computing Systems ◽

Computing Systems ◽

Artificial Intelligent ◽

Heterogeneous Distributed Computing ◽

Heterogeneous Distributed Computing Systems

Download Full-text