AOCMS: An Adaptive and Scalable Monitoring System for Large-Scale Clusters

Author(s):  
Zhenghua Xue ◽  
Xiaoshe Dong ◽  
Weiguo Wu
2006 ◽  
Vol 29 (10) ◽  
pp. 1687-1695 ◽  
Author(s):  
A. Mehaoua ◽  
T. Ahmed ◽  
H. Asgari ◽  
M. Sidibe ◽  
A. Nafaa ◽  
...  

Author(s):  
Zhixin Tie ◽  
David Ko ◽  
Harry H. Cheng

Mobile agent technology has become an important approach for the design and development of distributed systems. However, there is little research regarding the monitoring of computer resources and usage at large scale distributed computer centers. This paper presents a mobile agent-based system called the Mobile Agent Based Computer Monitoring System (MABCMS) that supports the dynamic sending and executing of control command, dynamic data exchange, and dynamic deployment of mobile code in C/C++. Based on the Mobile-C library, agents can call low level functions in binary dynamic or static libraries, and thus can monitor computer resources and usage conveniently and efficiently. Two experimental applications have been designed using the MABCMS. The experiments were conducted in a university computer center with hundreds of computer workstations and 15 server machines. The first experiment uses the MABCMS to detect improper usage of the computer workstations, such as playing computer games. The second experimental application uses the MABCMS to detect system resources such as available hard disk space. The experiments show that the mobile agent based monitoring system is an effective method for detecting and interacting with students playing computer games and a practical way to monitor computer resources in large scale distributed computer centers.


2015 ◽  
Vol 20 (7) ◽  
pp. 563-577 ◽  
Author(s):  
Tadayuki Tsujita ◽  
Liam Baird ◽  
Yuki Furusawa ◽  
Fumiki Katsuoka ◽  
Yoshika Hou ◽  
...  

2013 ◽  
Vol 5 (1) ◽  
pp. 53-69
Author(s):  
Jacques Jorda ◽  
Aurélien Ortiz ◽  
Abdelaziz M’zoughi ◽  
Salam Traboulsi

Grid computing is commonly used for large scale application requiring huge computation capabilities. In such distributed architectures, the data storage on the distributed storage resources must be handled by a dedicated storage system to ensure the required quality of service. In order to simplify the data placement on nodes and to increase the performance of applications, a storage virtualization layer can be used. This layer can be a single parallel filesystem (like GPFS) or a more complex middleware. The latter is preferred as it allows the data placement on the nodes to be tuned to increase both the reliability and the performance of data access. Thus, in such a middleware, a dedicated monitoring system must be used to ensure optimal performance. In this paper, the authors briefly introduce the Visage middleware – a middleware for storage virtualization. They present the most broadly used grid monitoring systems, and explain why they are not adequate for virtualized storage monitoring. The authors then present the architecture of their monitoring system dedicated to storage virtualization. We introduce the workload prediction model used to define the best node for data placement, and show on a simple experiment its accuracy.


Electronics ◽  
2019 ◽  
Vol 8 (9) ◽  
pp. 982 ◽  
Author(s):  
Alberto Cascajo ◽  
David E. Singh ◽  
Jesus Carretero

This work presents a HPC framework that provides new strategies for resource management and job scheduling, based on executing different applications in shared compute nodes, maximizing platform utilization. The framework includes a scalable monitoring tool that is able to analyze the platform’s compute node utilization. We also introduce an extension of CLARISSE, a middleware for data-staging coordination and control on large-scale HPC platforms that uses the information provided by the monitor in combination with application-level analysis to detect performance degradation in the running applications. This degradation, caused by the fact that the applications share the compute nodes and may compete for their resources, is avoided by means of dynamic application migration. A description of the architecture, as well as a practical evaluation of the proposal, shows significant performance improvements up to 20% in the makespan and 10% in energy consumption compared to a non-optimized execution.


Author(s):  
Rafael Nilson Rodrigues ◽  
Juliano Kasmirski Zatta ◽  
Jonas Vieira de Souza ◽  
Anna Luiza Espindola ◽  
Eduardo Galera de Carvalho

Sign in / Sign up

Export Citation Format

Share Document