Large-scale File System Design and Architecture

As one of the workload characteristics, the anomaly behaviors in real workload have been recognized as a critical requirement for file system design. In this paper, a set of traces collected from typically realistic file system have been analyzed. The correlation study of I/O request inter-arrival times shows that it is necessary to examine the self-similarity in file system workload. Then the phenomenon of self-similarity which had been initially observed in network and disk I/O workload has also been visually and statistically observed in file system workload. In addition, we implement an I/O series generator in which the inputs are the measured properties of the available trace data. Experimental results show that this model can accurately emulate the complex access arrival behaviors of real file systems, particularly the heavy-tail characteristics.

Download Full-text

Dynamic Load Rebalancing Algorithm for Private Cloud

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.573.556 ◽

2014 ◽

Vol 573 ◽

pp. 556-559

Author(s):

A. Shenbaga Bharatha Priya ◽

J. Ganesh ◽

Mareeswari M. Devi

Keyword(s):

Dynamic Load ◽

Large Scale ◽

File System ◽

Single Point ◽

File Systems ◽

Distributed File System ◽

Private Cloud ◽

Global Knowledge ◽

Load Imbalance ◽

And Storage

Infrastructure-As-A-Service (IAAS) provides an environmental setup under any type of cloud. In Distributed file system (DFS), nodes are simultaneously serve computing and storage functions; that is parallel Data Processing and storage in cloud. Here, file is considered as a data or load. That file is partitioned into a number of File chunks (FC) allocated in distinct nodes so that Map Reduce tasks can be performed in parallel over the nodes. Files and Nodes can be dynamically created, deleted, and added. This results in load imbalance in a distributed file system; that is, the file chunks are not distributed as uniformly as possible among the Chunk Servers (CS). Emerging distributed file systems in production systems strongly depend on a central node for chunk reallocation or Distributed node to maintain global knowledge of all chunks. This dependence is clearly inadequate in a large-scale, failure-prone environment because the central load balancer is put under considerable workload that is linearly scaled with the system size, it may thus become the performance bottleneck and the single point of failure and memory wastage in distributed nodes. So, we have to enhance the Client side module with server side module to create, delete and update the file chunks in Client Module. And manage the overall private cloud and apply dynamic load balancing algorithm to perform auto scaling options in private cloud. In this project, a fully distributed load rebalancing algorithm is presented to cope with the load imbalance problem.

Download Full-text

An Approach for Evaluating and Mitigating Intra-Application I/O Performance Variability Over Parallel File Systems

10.5753/wscad_estendido.2019.8709 ◽

2019 ◽

Author(s):

Eduardo Inacio ◽

Mario Antonio Dantas

Keyword(s):

Load Distribution ◽

Large Scale ◽

File System ◽

Storage System ◽

Research Work ◽

File Systems ◽

Distributed Applications ◽

Evaluation Tool ◽

Performance Variability ◽

Parallel File

To meet ever increasing capacity and performance requirements of emerging data-intensive applications, highly distributed and multilayered back-end storage systems have been employed in large-scale high performance computing (HPC) environments. A main component of these storage infrastructures is the parallel file system (PFS), a especially designed file system for absorbing bulk data transfers from applications with thousands of concurrent processes. Load distribution on PFS data servers compose a major source of intra-application input/output (I/O) performance variability. Albeit mitigating variability is desirable, as it is known to harm application-perceived performance, understanding and dealing with I/O performance variability in such complex environments remains a challenging task. In this research, a differentiated approach for evaluating and mitigating intra-application I/O performance variability over PFSs is proposed. More specifically, from the evaluation perspective, a comprehensive approach combining complementary methods is proposed. An analytical model proposal, named DTSMaxLoad, provides estimates for the maximum load in a PFS data server. To complement DTSMaxLoad, modeling conditions and mechanisms hard to represent analytically, the Parallel I/O and Storage System (PIOSS) simulation model was proposed. Finally, for experimental evaluation over real environments, a flexible and distributed I/O performance evaluation tool, coined as IOR-Extended (IORE), was proposed. Furthermore, a high-level file distribution approach for PFSs, called N-N Round-Robin (N2R2), was proposed focusing on mitigating I/O performance variability for distributed applications where each process accesses an individual and independent file. An extensive experimental effort, including measurements on real environments, was conducted in this research work for evaluating each of the proposed approaches. In summary, this evaluation indicated both DTSMaxLoad and PIOSS modeling proposals can represent load distribution behavior on PFSs with significant fidelity. Moreover, results demonstrated N2R2 successfully reduced intra-application I/O performance variability for 270 distinct experimental scenarios, which, ultimately, translated into overall application I/O performance Improvements.

Download Full-text

Exploiting Cloud Computing and Web Services to Achieve Data Consistency, Availability, and Partition Tolerance in the Large-Scale Pervasive Systems

International Journal of Interactive Mobile Technologies (iJIM) ◽

10.3991/ijim.v15i15.22517 ◽

2021 ◽

Vol 15 (15) ◽

pp. 74

Author(s):

Ashraf Ahmed Fadelelmoula

Keyword(s):

Cloud Computing ◽

Web Services ◽

Large Scale ◽

Propagation Delay ◽

Data Availability ◽

Data Consistency ◽

Critical Properties ◽

Data Caching ◽

High Data ◽

Caching Scheme

This article presents a new comprehensive approach to realize a sufficient trade-off between the CAP properties (i.e., consistency, availability, and partition tolerance) in the large-scale pervasive information systems. To achieve these critical properties, the capabilities of both cloud computing and web services were exploited in developing the components of the proposed approach. These components include a cloud-based replication architecture for ensuring high data availability and achieving partition tolerance, a web services-based middleware for maintaining the eventual consistency, and a data caching scheme to enable the mobile computing elements to conduct update transactions during the disconnection periods. The evaluation of the performance aspects revealed that the proposed approach is able to achieve a load balance, lower propagation delay, and higher cache hit ratio, as compared to other baseline approaches.

Download Full-text

DMFSsim: A Distributed Metadata File System Simulator

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.241-244.1556 ◽

2012 ◽

Vol 241-244 ◽

pp. 1556-1561

Author(s):

Qi Meng Wu ◽

Ke Xie ◽

Ming Fa Zhu ◽

Li Min Xiao ◽

Li Ruan

Keyword(s):

Large Scale ◽

File System ◽

File Systems ◽

Performance Gain ◽

Parallel File Systems ◽

Management Mechanism ◽

Metadata File ◽

Parallel File ◽

Distribution Algorithms

Parallel file systems deploy multiple metadata servers to distribute heavy metadata workload from clients. With the increasing number of metadata servers, metadata-intensive operations are facing some problems related with collaboration among them, compromising the performance gain. Consequently, a file system simulator is very helpful to try out some optimization ideas to solve these problems. In this paper, we propose DMFSsim to simulate the metadata-intensive operations on large-scale distributed metadata file systems. DMFSsim can flexibly replay traces of multiple metadata operations, support several commonly used metadata distribution algorithms, simulate file system tree hierarchy and underlying disk blocks management mechanism in real systems. Extensive simulations show that DMFSsim is capable of demonstrating the performance of metadata-intensive operations in distributed metadata file system.

Download Full-text

An Adaptive Performance Prediction Method of Distributed File System Based on Performance Correlation

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.998-999.1362 ◽

2014 ◽

Vol 998-999 ◽

pp. 1362-1365

Author(s):

Wei Feng Gao ◽

Tie Zhu Zhao ◽

Ming Bin Lin

Keyword(s):

Prediction Model ◽

Performance Prediction ◽

Large Scale ◽

File System ◽

File Systems ◽

Prediction Method ◽

Distributed File System ◽

Distributed File Systems ◽

Continuous Growth ◽

Modeling And Analysis

Distributed file systems are emerging as a key component of large scale cloud storage platform due to the continuous growth of the amount of application data. Performance modeling and analysis is an important concern in the distributed file system area. This paper focuses on the performance prediction and modeling issues. An adaptive prediction model (APModel) is proposed to predict the performance of distributed file systems by capturing the performance correlation of different performance factors. We perform a series of experiments to validate the proposed prediction model. The experiment results indicate our proposed approach can get better prediction accuracy. It is practical and can achieve sufficient performance analysis for distributed file systems.

Download Full-text

Model-Based System Design Using SysML

Formal Languages for Computer Simulation ◽

10.4018/978-1-4666-4369-7.ch008 ◽

2013 ◽

pp. 236-266

Author(s):

Anargyros Tsadimas ◽

Mara Nikolaidou ◽

Dimosthenis Anagnostopoulos

Keyword(s):

Information System ◽

System Design ◽

System Architecture ◽

Large Scale ◽

Functional Requirements ◽

Enterprise Information System ◽

Enterprise Information ◽

Model Based ◽

Design Activities ◽

Verification Process

Model-based system design is served by a single, multi-layered model supporting all design activities, in different levels of detail. SysML is a modeling language, endorsed by OMG, for system engineering, which aims at defining such models for system design. It provides discrete diagrams to describe system structure and components, to explore allocation policies crucial for system design, and to identify design requirements. In this chapter, SysML is used for the model-based design of enterprise information system architecture, supporting a systemic view of such systems, where software and hardware entities are treated as system components composed to create the system architecture. SysML extensions to facilitate the effective description of non-functional requirements, especially quantitative ones, and their verification are presented. The integration of evaluation parameters and results into a discrete SysML diagram enhances the requirement verification process, while the visualization of evaluation data helps system engineers to explore design decisions and properly adjust system design. Based on the proposed extensions, a SysML profile is developed. The experience obtained when applying the profile for renovating the architecture of a large-scale enterprise information system is also briefly discussed to explore the potential of the proposed extensions.

Download Full-text

XtreemFS

Data Intensive Storage Services for Cloud Environments ◽

10.4018/978-1-4666-3934-8.ch016 ◽

2013 ◽

pp. 267-285 ◽

Cited By ~ 3

Author(s):

Jan Stender ◽

Michael Berlin ◽

Alexander Reinefeld

Keyword(s):

Data Storage ◽

Large Scale ◽

File System ◽

Fault Tolerant ◽

File Systems ◽

Data Access ◽

Comprehensive Overview ◽

Cloud Providers ◽

The Face ◽

Cloud Users

Cloud computing poses new challenges to data storage. While cloud providers use shared distributed hardware, which is inherently unreliable and insecure, cloud users expect their data to be safely and securely stored, available at any time, and accessible in the same way as their locally stored data. In this chapter, the authors present XtreemFS, a file system for the cloud. XtreemFS reconciles the need of cloud providers for cheap scale-out storage solutions with that of cloud users for a reliable, secure, and easy data access. The main contributions of the chapter are: a description of the internal architecture of XtreemFS, which presents an approach to build large-scale distributed POSIX-compliant file systems on top of cheap, off-the-shelf hardware; a description of the XtreemFS security infrastructure, which guarantees an isolation of individual users despite shared and insecure storage and network resources; a comprehensive overview of replication mechanisms in XtreemFS, which guarantee consistency, availability, and durability of data in the face of component failures; an overview of the snapshot infrastructure of XtreemFS, which allows to capture and freeze momentary states of the file system in a scalable and fault-tolerant fashion. The authors also compare XtreemFS with existing solutions and argue for its practicability and potential in the cloud storage market.

Download Full-text

Model-Based System Design Using SysML

Web Design and Development ◽

10.4018/978-1-4666-8619-9.ch014 ◽

2016 ◽

pp. 278-301

Author(s):

Anargyros Tsadimas ◽

Mara Nikolaidou ◽

Dimosthenis Anagnostopoulos

Keyword(s):

Information System ◽

System Design ◽

System Architecture ◽

Large Scale ◽

Functional Requirements ◽

Enterprise Information System ◽

Enterprise Information ◽

Model Based ◽

Design Activities ◽

Verification Process

Model-based system design is served by a single, multi-layered model supporting all design activities, in different levels of detail. SysML is a modeling language, endorsed by OMG, for system engineering, which aims at defining such models for system design. It provides discrete diagrams to describe system structure and components, to explore allocation policies crucial for system design, and to identify design requirements. In this chapter, SysML is used for the model-based design of enterprise information system architecture, supporting a systemic view of such systems, where software and hardware entities are treated as system components composed to create the system architecture. SysML extensions to facilitate the effective description of non-functional requirements, especially quantitative ones, and their verification are presented. The integration of evaluation parameters and results into a discrete SysML diagram enhances the requirement verification process, while the visualization of evaluation data helps system engineers to explore design decisions and properly adjust system design. Based on the proposed extensions, a SysML profile is developed. The experience obtained when applying the profile for renovating the architecture of a large-scale enterprise information system is also briefly discussed to explore the potential of the proposed extensions.

Download Full-text

Erasure-Coding-Based Storage and Recovery for Distributed Exascale Storage Systems

Applied Sciences ◽

10.3390/app11083298 ◽

2021 ◽

Vol 11 (8) ◽

pp. 3298

Author(s):

Jeong-Joon Kim

Keyword(s):

File System ◽

File Systems ◽

Data Availability ◽

Distributed File System ◽

Distributed File Systems ◽

Erasure Coding ◽

Input Output ◽

Space Efficiency ◽

Random Block ◽

Replication Technique

Various techniques have been used in distributed file systems for data availability and stability. Typically, a method for storing data in a replication technique-based distributed file system is used, but due to the problem of space efficiency, an erasure-coding (EC) technique has been utilized more recently. The EC technique improves the space efficiency problem more than the replication technique does. However, the EC technique has various performance degradation factors, such as encoding and decoding and input and output (I/O) degradation. Thus, this study proposes a buffering and combining technique in which various I/O requests that occurred during encoding in an EC-based distributed file system are combined into one and processed. In addition, it proposes four recovery measures (disk input/output load distribution, random block layout, multi-thread-based parallel recovery, and matrix recycle technique) to distribute the disk input/output loads generated during decoding.

Download Full-text