Monitoring of a Grid Storage Virtualization Service

2013 ◽  
Vol 5 (1) ◽  
pp. 53-69
Author(s):  
Jacques Jorda ◽  
Aurélien Ortiz ◽  
Abdelaziz M’zoughi ◽  
Salam Traboulsi

Grid computing is commonly used for large scale application requiring huge computation capabilities. In such distributed architectures, the data storage on the distributed storage resources must be handled by a dedicated storage system to ensure the required quality of service. In order to simplify the data placement on nodes and to increase the performance of applications, a storage virtualization layer can be used. This layer can be a single parallel filesystem (like GPFS) or a more complex middleware. The latter is preferred as it allows the data placement on the nodes to be tuned to increase both the reliability and the performance of data access. Thus, in such a middleware, a dedicated monitoring system must be used to ensure optimal performance. In this paper, the authors briefly introduce the Visage middleware – a middleware for storage virtualization. They present the most broadly used grid monitoring systems, and explain why they are not adequate for virtualized storage monitoring. The authors then present the architecture of their monitoring system dedicated to storage virtualization. We introduce the workload prediction model used to define the best node for data placement, and show on a simple experiment its accuracy.

2021 ◽  
Vol 13 (9) ◽  
pp. 1815
Author(s):  
Xiaohua Zhou ◽  
Xuezhi Wang ◽  
Yuanchun Zhou ◽  
Qinghui Lin ◽  
Jianghua Zhao ◽  
...  

With the remarkable development and progress of earth-observation techniques, remote sensing data keep growing rapidly and their volume has reached exabyte scale. However, it's still a big challenge to manage and process such huge amounts of remote sensing data with complex and diverse structures. This paper designs and realizes a distributed storage system for large-scale remote sensing data storage, access, and retrieval, called RSIMS (remote sensing images management system), which is composed of three sub-modules: RSIAPI, RSIMeta, RSIData. Structured text metadata of different remote sensing images are all stored in RSIMeta based on a set of uniform models, and then indexed by the distributed multi-level Hilbert grids for high spatiotemporal retrieval performance. Unstructured binary image files are stored in RSIData, which provides large scalable storage capacity and efficient GDAL (Geospatial Data Abstraction Library) compatible I/O interfaces. Popular GIS software and tools (e.g., QGIS, ArcGIS, rasterio) can access data stored in RSIData directly. RSIAPI provides users a set of uniform interfaces for data access and retrieval, hiding the complex inner structures of RSIMS. The test results show that RSIMS can store and manage large amounts of remote sensing images from various sources with high and stable performance, and is easy to deploy and use.


2021 ◽  
Vol 251 ◽  
pp. 02035
Author(s):  
Adrian Eduard Negru ◽  
Latchezar Betev ◽  
Mihai Carabaș ◽  
Costin Grigoraș ◽  
Nicolae Țăpuş ◽  
...  

CERN uses the world’s largest scientific computing grid, WLCG, for distributed data storage and processing. Monitoring of the CPU and storage resources is an important and essential element to detect operational issues in its systems, for example in the storage elements, and to ensure their proper and efficient function. The processing of experiment data depends strongly on the data access quality, as well as its integrity and both of these key parameters must be assured for the data lifetime. Given the substantial amount of data, O(200 PB), already collected by ALICE and kept at various storage elements around the globe, scanning every single data chunk would be a very expensive process, both in terms of computing resources usage and in terms of execution time. In this paper, we describe a distributed file crawler that addresses these natural limits by periodically extracting and analyzing statistically significant samples of files from storage elements, evaluates the results and is integrated with the existing monitoring solution, MonALISA.


Author(s):  
Jun Tian ◽  
Lirong Huang

<span lang="EN-US">Aiming at the perception data acquired by the widely used, fast-developing but still not perfect wireless sensor network system, a relatively complete and universal system for the collection, transmission, storage and cluster analysis of perception data is designed. P</span><span lang="EN-US">erception data is spliced and compressed at the node and reconstructed at the base station, the problem of the acquisition of </span><span lang="EN-US">perception data</span><span lang="EN-US"> and energy consumption of transmission is optimized, the distributed storage system is established, and the data reading mechanism and data storage architecture are designed accordingly.</span><span lang="EN-US">The data acquisition protocol and the traditional protocol, the storage system itself and the Oracle database system, and <a name="_Hlk527548018"></a>Standard Deviation and Eigensystem Realization Algorithm are respectively adopted for comparison test.</span><span lang="EN-US">Based on Standard Deviation algorithm, the operation of suffix tree clustering is carried out, and the general steps of suffix tree clustering are studied and the structure of perception data and the characteristics of storage are adapted, and the data classification operation based on suffix tree clustering is completed.</span><span lang="EN-US"> The results show that </span><span lang="EN-US">proposed Standard Deviationalgorithm algorithm not only inherits the efficiency of the classical algorithm for processing big data, but also has obvious effect on large-scale discrete data processing, and the efficiency is obviously improved compared with the traditional method.</span>


2019 ◽  
Vol 214 ◽  
pp. 04033
Author(s):  
Hervé Rousseau ◽  
Belinda Chan Kwok Cheong ◽  
Cristian Contescu ◽  
Xavier Espinal Curull ◽  
Jan Iven ◽  
...  

The CERN IT Storage group operates multiple distributed storage systems and is responsible for the support of the infrastructure to accommodate all CERN storage requirements, from the physics data generated by LHC and non-LHC experiments to the personnel users' files. EOS is now the key component of the CERN Storage strategy. It allows to operate at high incoming throughput for experiment data-taking while running concurrent complex production work-loads. This high-performance distributed storage provides now more than 250PB of raw disks and it is the key component behind the success of CERNBox, the CERN cloud synchronisation service which allows syncing and sharing files on all major mobile and desktop platforms to provide offline availability to any data stored in the EOS infrastructure. CERNBox recorded an exponential growth in the last couple of year in terms of files and data stored thanks to its increasing popularity inside CERN users community and thanks to its integration with a multitude of other CERN services (Batch, SWAN, Microsoft Office). In parallel CASTOR is being simplified and transitioning from an HSM into an archival system, focusing mainly in the long-term data recording of the primary data from the detectors, preparing the road to the next-generation tape archival system, CTA. The storage services at CERN cover as well the needs of the rest of our community: Ceph as data back-end for the CERN OpenStack infrastructure, NFS services and S3 functionality; AFS for legacy home directory filesystem services and its ongoing phase-out and CVMFS for software distribution. In this paper we will summarise our experience in supporting all our distributed storage system and the ongoing work in evolving our infrastructure, testing very-dense storage building block (nodes with more than 1PB of raw space) for the challenges waiting ahead.


2013 ◽  
Vol 765-767 ◽  
pp. 1087-1091
Author(s):  
Hong Lin ◽  
Shou Gang Chen ◽  
Bao Hui Wang

Recently, with the development of Internet and the coming of new application modes, data storage has some new characters and new requirements. In this paper, a Distributed Computing Framework Mass Small File storage System (For short:Dnet FS) based on Windows Communication Foundation in .Net platform is presented, which is lightweight, good-expansibility, running in cheap hardware platform, supporting Large-scale concurrent access, and having certain fault-tolerance. The framework of this system is analyzed and the performance of this system is tested and compared. All of these prove this system meet requirements.


2011 ◽  
pp. 544-549
Author(s):  
Ning Chen

In many large-scale enterprise information system solutions, process design, data modeling and software component design are performed relatively independently by different people using various tools and methodologies. This usually leads to gaps among business process modeling, component design and data modeling. Currently, these functional or non-functional disconnections are fixed manually, which increases the complexity and decrease the efficiency and quality of development. In this chapter, a pattern-based approach is proposed to bridge the gaps with automatically generated data access components. Data access rules and patterns are applied to optimize these data access components. In addition, the authors present the design of a toolkit that automatically applies these patterns to bridge the gaps to ensure reduced development time, and higher solution quality.


Author(s):  
Sai Narasimhamurthy ◽  
Malcolm Muggeridge ◽  
Stefan Waldschmidt ◽  
Fabio Checconi ◽  
Tommaso Cucinotta

The service oriented infrastructures for real-time applications (“real-time clouds1”) pose certain unique challenges for the data storage subsystem, which indeed is the “last mile” for all data accesses. Data storage subsystems typically used in regular enterprise environments have many limitations which impedes direct applicability for such clouds, particularly in their ability to provide Quality of Service (QoS) for applications. Provision of QoS within storage is possible through a deeper understanding of the behaviour of the storage system under a variety of conditions dictated by the application and the network infrastructure. We intend to arrive at a QoS mechanism for data storage keeping in view the important parameters that come into play for the storage subsystem in a soft real-time cloud environment.


2018 ◽  
Vol 7 (4.6) ◽  
pp. 13
Author(s):  
Mekala Sandhya ◽  
Ashish Ladda ◽  
Dr. Uma N Dulhare ◽  
. . ◽  
. .

In this generation of Internet, information and data are growing continuously. Even though various Internet services and applications. The amount of information is increasing rapidly. Hundred billions even trillions of web indexes exist. Such large data brings people a mass of information and more difficulty discovering useful knowledge in these huge amounts of data at the same time. Cloud computing can provide infrastructure for large data. Cloud computing has two significant characteristics of distributed computing i.e. scalability, high availability. The scalability can seamlessly extend to large-scale clusters. Availability says that cloud computing can bear node errors. Node failures will not affect the program to run correctly. Cloud computing with data mining does significant data processing through high-performance machine. Mass data storage and distributed computing provide a new method for mass data mining and become an effective solution to the distributed storage and efficient computing in data mining. 


Author(s):  
Ning Chen

In many large-scale enterprise information system solutions, process design, data modeling and software component design are performed relatively independently by different people using various tools and methodologies. This usually leads to gaps among business process modeling, component design and data modeling. Currently, these functional or non-functional disconnections are fixed manually, which increases the complexity and decrease the efficiency and quality of development. In this chapter, a pattern-based approach is proposed to bridge the gaps with automatically generated data access components. Data access rules and patterns are applied to optimize these data access components. In addition, the authors present the design of a toolkit that automatically applies these patterns to bridge the gaps to ensure reduced development time, and higher solution quality.


Sign in / Sign up

Export Citation Format

Share Document