Monitoring of a Grid Storage Virtualization Service

Jacques Jorda; Aurélien Ortiz; Abdelaziz M’zoughi; Salam Traboulsi

doi:10.4018/jghpc.2013010104

Monitoring of a Grid Storage Virtualization Service

International Journal of Grid and High Performance Computing ◽

10.4018/jghpc.2013010104 ◽

2013 ◽

Vol 5 (1) ◽

pp. 53-69

Author(s):

Jacques Jorda ◽

Aurélien Ortiz ◽

Abdelaziz M’zoughi ◽

Salam Traboulsi

Keyword(s):

Monitoring System ◽

Data Storage ◽

Large Scale ◽

Distributed Storage ◽

Storage System ◽

Data Access ◽

Data Placement ◽

Workload Prediction ◽

Storage Virtualization

Grid computing is commonly used for large scale application requiring huge computation capabilities. In such distributed architectures, the data storage on the distributed storage resources must be handled by a dedicated storage system to ensure the required quality of service. In order to simplify the data placement on nodes and to increase the performance of applications, a storage virtualization layer can be used. This layer can be a single parallel filesystem (like GPFS) or a more complex middleware. The latter is preferred as it allows the data placement on the nodes to be tuned to increase both the reliability and the performance of data access. Thus, in such a middleware, a dedicated monitoring system must be used to ensure optimal performance. In this paper, the authors briefly introduce the Visage middleware – a middleware for storage virtualization. They present the most broadly used grid monitoring systems, and explain why they are not adequate for virtualized storage monitoring. The authors then present the architecture of their monitoring system dedicated to storage virtualization. We introduce the workload prediction model used to define the best node for data placement, and show on a simple experiment its accuracy.

Download Full-text

RSIMS: Large-Scale Heterogeneous Remote Sensing Images Management System

Remote Sensing ◽

10.3390/rs13091815 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1815

Author(s):

Xiaohua Zhou ◽

Xuezhi Wang ◽

Yuanchun Zhou ◽

Qinghui Lin ◽

Jianghua Zhao ◽

...

Keyword(s):

Remote Sensing ◽

Data Storage ◽

Management System ◽

Large Scale ◽

Distributed Storage ◽

Storage System ◽

Remote Sensing Data ◽

Data Access ◽

Remote Sensing Images ◽

Sensing Data

With the remarkable development and progress of earth-observation techniques, remote sensing data keep growing rapidly and their volume has reached exabyte scale. However, it's still a big challenge to manage and process such huge amounts of remote sensing data with complex and diverse structures. This paper designs and realizes a distributed storage system for large-scale remote sensing data storage, access, and retrieval, called RSIMS (remote sensing images management system), which is composed of three sub-modules: RSIAPI, RSIMeta, RSIData. Structured text metadata of different remote sensing images are all stored in RSIMeta based on a set of uniform models, and then indexed by the distributed multi-level Hilbert grids for high spatiotemporal retrieval performance. Unstructured binary image files are stored in RSIData, which provides large scalable storage capacity and efficient GDAL (Geospatial Data Abstraction Library) compatible I/O interfaces. Popular GIS software and tools (e.g., QGIS, ArcGIS, rasterio) can access data stored in RSIData directly. RSIAPI provides users a set of uniform interfaces for data access and retrieval, hiding the complex inner structures of RSIMS. The test results show that RSIMS can store and manage large amounts of remote sensing images from various sources with high and stable performance, and is easy to deploy and use.

Download Full-text

Analysis of data integrity and storage quality of a distributed storage system

EPJ Web of Conferences ◽

10.1051/epjconf/202125102035 ◽

2021 ◽

Vol 251 ◽

pp. 02035

Author(s):

Adrian Eduard Negru ◽

Latchezar Betev ◽

Mihai Carabaș ◽

Costin Grigoraș ◽

Nicolae Țăpuş ◽

...

Keyword(s):

Data Storage ◽

Distributed Storage ◽

Storage System ◽

Essential Element ◽

Data Access ◽

Distributed Data ◽

Distributed Data Storage ◽

Data Lifetime ◽

Operational Issues ◽

And Storage

CERN uses the world’s largest scientific computing grid, WLCG, for distributed data storage and processing. Monitoring of the CPU and storage resources is an important and essential element to detect operational issues in its systems, for example in the storage elements, and to ensure their proper and efficient function. The processing of experiment data depends strongly on the data access quality, as well as its integrity and both of these key parameters must be assured for the data lifetime. Given the substantial amount of data, O(200 PB), already collected by ALICE and kept at various storage elements around the globe, scanning every single data chunk would be a very expensive process, both in terms of computing resources usage and in terms of execution time. In this paper, we describe a distributed file crawler that addresses these natural limits by periodically extracting and analyzing statistically significant samples of files from storage elements, evaluates the results and is integrated with the existing monitoring solution, MonALISA.

Download Full-text

Classification and Processing of Big Data in Sensor Network Based on Suffix Tree Clustering

International Journal of Online and Biomedical Engineering (iJOE) ◽

10.3991/ijoe.v15i01.9785 ◽

2019 ◽

Vol 15 (01) ◽

pp. 171

Author(s):

Jun Tian ◽

Lirong Huang

Keyword(s):

Big Data ◽

Standard Deviation ◽

Sensor Network ◽

Data Storage ◽

Large Scale ◽

Suffix Tree ◽

Distributed Storage ◽

Storage System ◽

Base Station ◽

Universal System

Aiming at the perception data acquired by the widely used, fast-developing but still not perfect wireless sensor network system, a relatively complete and universal system for the collection, transmission, storage and cluster analysis of perception data is designed. Perception data is spliced and compressed at the node and reconstructed at the base station, the problem of the acquisition of perception data and energy consumption of transmission is optimized, the distributed storage system is established, and the data reading mechanism and data storage architecture are designed accordingly.The data acquisition protocol and the traditional protocol, the storage system itself and the Oracle database system, and <a name="_Hlk527548018"></a>Standard Deviation and Eigensystem Realization Algorithm are respectively adopted for comparison test.Based on Standard Deviation algorithm, the operation of suffix tree clustering is carried out, and the general steps of suffix tree clustering are studied and the structure of perception data and the characteristics of storage are adapted, and the data classification operation based on suffix tree clustering is completed. The results show that proposed Standard Deviationalgorithm algorithm not only inherits the efficiency of the classical algorithm for processing big data, but also has obvious effect on large-scale discrete data processing, and the efficiency is obviously improved compared with the traditional method.

Download Full-text

Research on Large-scale Ship Data Storage System

2020 International Conference on Information Science, Parallel and Distributed Systems (ISPDS) ◽

10.1109/ispds51347.2020.00028 ◽

2020 ◽

Author(s):

LEI WANG

Keyword(s):

Data Storage ◽

Large Scale ◽

Storage System ◽

Data Storage System

Download Full-text

Providing large-scale disk storage at CERN

EPJ Web of Conferences ◽

10.1051/epjconf/201921404033 ◽

2019 ◽

Vol 214 ◽

pp. 04033

Author(s):

Hervé Rousseau ◽

Belinda Chan Kwok Cheong ◽

Cristian Contescu ◽

Xavier Espinal Curull ◽

Jan Iven ◽

...

Keyword(s):

High Performance ◽

Large Scale ◽

Distributed Storage ◽

Storage System ◽

Primary Data ◽

Disk Storage ◽

The Road ◽

Ongoing Work ◽

Software Distribution ◽

Microsoft Office

The CERN IT Storage group operates multiple distributed storage systems and is responsible for the support of the infrastructure to accommodate all CERN storage requirements, from the physics data generated by LHC and non-LHC experiments to the personnel users' files. EOS is now the key component of the CERN Storage strategy. It allows to operate at high incoming throughput for experiment data-taking while running concurrent complex production work-loads. This high-performance distributed storage provides now more than 250PB of raw disks and it is the key component behind the success of CERNBox, the CERN cloud synchronisation service which allows syncing and sharing files on all major mobile and desktop platforms to provide offline availability to any data stored in the EOS infrastructure. CERNBox recorded an exponential growth in the last couple of year in terms of files and data stored thanks to its increasing popularity inside CERN users community and thanks to its integration with a multitude of other CERN services (Batch, SWAN, Microsoft Office). In parallel CASTOR is being simplified and transitioning from an HSM into an archival system, focusing mainly in the long-term data recording of the primary data from the detectors, preparing the road to the next-generation tape archival system, CTA. The storage services at CERN cover as well the needs of the rest of our community: Ceph as data back-end for the CERN OpenStack infrastructure, NFS services and S3 functionality; AFS for legacy home directory filesystem services and its ongoing phase-out and CVMFS for software distribution. In this paper we will summarise our experience in supporting all our distributed storage system and the ongoing work in evolving our infrastructure, testing very-dense storage building block (nodes with more than 1PB of raw space) for the challenges waiting ahead.

Download Full-text

Research and Design of the Distributed Mass Small File Storage System Based on WCF

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.765-767.1087 ◽

2013 ◽

Vol 765-767 ◽

pp. 1087-1091

Author(s):

Hong Lin ◽

Shou Gang Chen ◽

Bao Hui Wang

Keyword(s):

Fault Tolerance ◽

Distributed Computing ◽

Data Storage ◽

Large Scale ◽

Storage System ◽

Hardware Platform ◽

File Storage ◽

Computing Framework ◽

Research And Design ◽

Small File

Recently, with the development of Internet and the coming of new application modes, data storage has some new characters and new requirements. In this paper, a Distributed Computing Framework Mass Small File storage System (For short:Dnet FS) based on Windows Communication Foundation in .Net platform is presented, which is lightweight, good-expansibility, running in cheap hardware platform, supporting Large-scale concurrent access, and having certain fault-tolerance. The framework of this system is analyzed and the performance of this system is tested and compared. All of these prove this system meet requirements.

Download Full-text

Facilitating Design of Efficient Components by Bridging Gaps between Data Model and Business Process via Analysis of Service Traits of Data

Enterprise Information Systems ◽

10.4018/978-1-61692-852-0.ch214 ◽

2011 ◽

pp. 544-549

Author(s):

Ning Chen

Keyword(s):

Business Process ◽

Large Scale ◽

Data Modeling ◽

Data Access ◽

Enterprise Information System ◽

Enterprise Information ◽

Solution Quality ◽

Design Data ◽

Component Design

In many large-scale enterprise information system solutions, process design, data modeling and software component design are performed relatively independently by different people using various tools and methodologies. This usually leads to gaps among business process modeling, component design and data modeling. Currently, these functional or non-functional disconnections are fixed manually, which increases the complexity and decrease the efficiency and quality of development. In this chapter, a pattern-based approach is proposed to bridge the gaps with automatically generated data access components. Data access rules and patterns are applied to optimize these data access components. In addition, the authors present the design of a toolkit that automatically applies these patterns to bridge the gaps to ensure reduced development time, and higher solution quality.

Download Full-text

Data Storage in Cloud Based Real-Time Environments

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Achieving Real-Time in Distributed Computing ◽

10.4018/978-1-60960-827-9.ch013 ◽

2011 ◽

pp. 236-258

Author(s):

Sai Narasimhamurthy ◽

Malcolm Muggeridge ◽

Stefan Waldschmidt ◽

Fabio Checconi ◽

Tommaso Cucinotta

Keyword(s):

Quality Of Service ◽

Real Time ◽

Data Storage ◽

Storage System ◽

Cloud Environment ◽

Last Mile ◽

Service Oriented ◽

Real Time Applications ◽

Direct Applicability

The service oriented infrastructures for real-time applications (“real-time clouds1”) pose certain unique challenges for the data storage subsystem, which indeed is the “last mile” for all data accesses. Data storage subsystems typically used in regular enterprise environments have many limitations which impedes direct applicability for such clouds, particularly in their ability to provide Quality of Service (QoS) for applications. Provision of QoS within storage is possible through a deeper understanding of the behaviour of the storage system under a variety of conditions dictated by the application and the network infrastructure. We intend to arrive at a QoS mechanism for data storage keeping in view the important parameters that come into play for the storage subsystem in a soft real-time cloud environment.

Download Full-text

A Review: Map Reduce Framework for Cloud Computing

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.6.20224 ◽

2018 ◽

Vol 7 (4.6) ◽

pp. 13

Author(s):

Mekala Sandhya ◽

Ashish Ladda ◽

Dr. Uma N Dulhare ◽

. . ◽

. .

Keyword(s):

Data Mining ◽

Cloud Computing ◽

Distributed Computing ◽

Data Storage ◽

High Performance ◽

Large Scale ◽

Distributed Storage ◽

Large Data ◽

Mass Data ◽

Internet Information

In this generation of Internet, information and data are growing continuously. Even though various Internet services and applications. The amount of information is increasing rapidly. Hundred billions even trillions of web indexes exist. Such large data brings people a mass of information and more difficulty discovering useful knowledge in these huge amounts of data at the same time. Cloud computing can provide infrastructure for large data. Cloud computing has two significant characteristics of distributed computing i.e. scalability, high availability. The scalability can seamlessly extend to large-scale clusters. Availability says that cloud computing can bear node errors. Node failures will not affect the program to run correctly. Cloud computing with data mining does significant data processing through high-performance machine. Mass data storage and distributed computing provide a new method for mass data mining and become an effective solution to the distributed storage and efficient computing in data mining.

Download Full-text