scholarly journals Large-scale File System Design and Architecture

10.14311/300 ◽  
2002 ◽  
Vol 42 (1) ◽  
Author(s):  
V. Dynda ◽  
P. Rydlo

This paper deals with design issues of a global file system, aiming to provide transparent data availability, security against loss and disclosure, and support for mobile and disconnected clients.First, the paper surveys general challenges and requirements for large-scale file systems, and then the design of particular elementary parts of the proposed file system is presented. This includes the design of the raw system architecture, the design of dynamic file replication with appropriate data consistency, file location and data security.Our proposed system is called Gaston, and will be referred further in the text under this name or its abbreviation GFS (Gaston File System).

Author(s):  
Qiang Zou ◽  
Yujuan Tan

As one of the workload characteristics, the anomaly behaviors in real workload have been recognized as a critical requirement for file system design. In this paper, a set of traces collected from typically realistic file system have been analyzed. The correlation study of I/O request inter-arrival times shows that it is necessary to examine the self-similarity in file system workload. Then the phenomenon of self-similarity which had been initially observed in network and disk I/O workload has also been visually and statistically observed in file system workload. In addition, we implement an I/O series generator in which the inputs are the measured properties of the available trace data. Experimental results show that this model can accurately emulate the complex access arrival behaviors of real file systems, particularly the heavy-tail characteristics.


2014 ◽  
Vol 573 ◽  
pp. 556-559
Author(s):  
A. Shenbaga Bharatha Priya ◽  
J. Ganesh ◽  
Mareeswari M. Devi

Infrastructure-As-A-Service (IAAS) provides an environmental setup under any type of cloud. In Distributed file system (DFS), nodes are simultaneously serve computing and storage functions; that is parallel Data Processing and storage in cloud. Here, file is considered as a data or load. That file is partitioned into a number of File chunks (FC) allocated in distinct nodes so that Map Reduce tasks can be performed in parallel over the nodes. Files and Nodes can be dynamically created, deleted, and added. This results in load imbalance in a distributed file system; that is, the file chunks are not distributed as uniformly as possible among the Chunk Servers (CS). Emerging distributed file systems in production systems strongly depend on a central node for chunk reallocation or Distributed node to maintain global knowledge of all chunks. This dependence is clearly inadequate in a large-scale, failure-prone environment because the central load balancer is put under considerable workload that is linearly scaled with the system size, it may thus become the performance bottleneck and the single point of failure and memory wastage in distributed nodes. So, we have to enhance the Client side module with server side module to create, delete and update the file chunks in Client Module. And manage the overall private cloud and apply dynamic load balancing algorithm to perform auto scaling options in private cloud. In this project, a fully distributed load rebalancing algorithm is presented to cope with the load imbalance problem.


Author(s):  
Eduardo Inacio ◽  
Mario Antonio Dantas

To meet ever increasing capacity and performance requirements of emerging data-intensive applications, highly distributed and multilayered back-end storage systems have been employed in large-scale high performance computing (HPC) environments. A main component of these storage infrastructures is the parallel file system (PFS), a especially designed file system for absorbing bulk data transfers from applications with thousands of concurrent processes. Load distribution on PFS data servers compose a major source of intra-application input/output (I/O) performance variability. Albeit mitigating variability is desirable, as it is known to harm application-perceived performance, understanding and dealing with I/O performance variability in such complex environments remains a challenging task. In this research, a differentiated approach for evaluating and mitigating intra-application I/O performance variability over PFSs is proposed. More specifically, from the evaluation perspective, a comprehensive approach combining complementary methods is proposed. An analytical model proposal, named DTSMaxLoad, provides estimates for the maximum load in a PFS data server. To complement DTSMaxLoad, modeling conditions and mechanisms hard to represent analytically, the Parallel I/O and Storage System (PIOSS) simulation model was proposed. Finally, for experimental evaluation over real environments, a flexible and distributed I/O performance evaluation tool, coined as IOR-Extended (IORE), was proposed. Furthermore, a high-level file distribution approach for PFSs, called N-N Round-Robin (N2R2), was proposed focusing on mitigating I/O performance variability for distributed applications where each process accesses an individual and independent file. An extensive experimental effort, including measurements on real environments, was conducted in this research work for evaluating each of the proposed approaches. In summary, this evaluation indicated both DTSMaxLoad and PIOSS modeling proposals can represent load distribution behavior on PFSs with significant fidelity. Moreover, results demonstrated N2R2 successfully reduced intra-application I/O performance variability for 270 distinct experimental scenarios, which, ultimately, translated into overall application I/O performance Improvements.


Author(s):  
Ashraf Ahmed Fadelelmoula

This article presents a new comprehensive approach to realize a sufficient trade-off between the CAP properties (i.e., consistency, availability, and partition tolerance) in the large-scale pervasive information systems. To achieve these critical properties, the capabilities of both cloud computing and web services were exploited in developing the components of the proposed approach. These components include a cloud-based replication architecture for ensuring high data availability and achieving partition tolerance, a web services-based middleware for maintaining the eventual consistency, and a data caching scheme to enable the mobile computing elements to conduct  update transactions during the disconnection periods.  The evaluation of the performance aspects revealed that the proposed approach is able to achieve a load balance, lower propagation delay, and higher cache hit ratio, as compared to other baseline approaches.


2012 ◽  
Vol 241-244 ◽  
pp. 1556-1561
Author(s):  
Qi Meng Wu ◽  
Ke Xie ◽  
Ming Fa Zhu ◽  
Li Min Xiao ◽  
Li Ruan

Parallel file systems deploy multiple metadata servers to distribute heavy metadata workload from clients. With the increasing number of metadata servers, metadata-intensive operations are facing some problems related with collaboration among them, compromising the performance gain. Consequently, a file system simulator is very helpful to try out some optimization ideas to solve these problems. In this paper, we propose DMFSsim to simulate the metadata-intensive operations on large-scale distributed metadata file systems. DMFSsim can flexibly replay traces of multiple metadata operations, support several commonly used metadata distribution algorithms, simulate file system tree hierarchy and underlying disk blocks management mechanism in real systems. Extensive simulations show that DMFSsim is capable of demonstrating the performance of metadata-intensive operations in distributed metadata file system.


2014 ◽  
Vol 998-999 ◽  
pp. 1362-1365
Author(s):  
Wei Feng Gao ◽  
Tie Zhu Zhao ◽  
Ming Bin Lin

Distributed file systems are emerging as a key component of large scale cloud storage platform due to the continuous growth of the amount of application data. Performance modeling and analysis is an important concern in the distributed file system area. This paper focuses on the performance prediction and modeling issues. An adaptive prediction model (APModel) is proposed to predict the performance of distributed file systems by capturing the performance correlation of different performance factors. We perform a series of experiments to validate the proposed prediction model. The experiment results indicate our proposed approach can get better prediction accuracy. It is practical and can achieve sufficient performance analysis for distributed file systems.


Author(s):  
Anargyros Tsadimas ◽  
Mara Nikolaidou ◽  
Dimosthenis Anagnostopoulos

Model-based system design is served by a single, multi-layered model supporting all design activities, in different levels of detail. SysML is a modeling language, endorsed by OMG, for system engineering, which aims at defining such models for system design. It provides discrete diagrams to describe system structure and components, to explore allocation policies crucial for system design, and to identify design requirements. In this chapter, SysML is used for the model-based design of enterprise information system architecture, supporting a systemic view of such systems, where software and hardware entities are treated as system components composed to create the system architecture. SysML extensions to facilitate the effective description of non-functional requirements, especially quantitative ones, and their verification are presented. The integration of evaluation parameters and results into a discrete SysML diagram enhances the requirement verification process, while the visualization of evaluation data helps system engineers to explore design decisions and properly adjust system design. Based on the proposed extensions, a SysML profile is developed. The experience obtained when applying the profile for renovating the architecture of a large-scale enterprise information system is also briefly discussed to explore the potential of the proposed extensions.


Author(s):  
Jan Stender ◽  
Michael Berlin ◽  
Alexander Reinefeld

Cloud computing poses new challenges to data storage. While cloud providers use shared distributed hardware, which is inherently unreliable and insecure, cloud users expect their data to be safely and securely stored, available at any time, and accessible in the same way as their locally stored data. In this chapter, the authors present XtreemFS, a file system for the cloud. XtreemFS reconciles the need of cloud providers for cheap scale-out storage solutions with that of cloud users for a reliable, secure, and easy data access. The main contributions of the chapter are: a description of the internal architecture of XtreemFS, which presents an approach to build large-scale distributed POSIX-compliant file systems on top of cheap, off-the-shelf hardware; a description of the XtreemFS security infrastructure, which guarantees an isolation of individual users despite shared and insecure storage and network resources; a comprehensive overview of replication mechanisms in XtreemFS, which guarantee consistency, availability, and durability of data in the face of component failures; an overview of the snapshot infrastructure of XtreemFS, which allows to capture and freeze momentary states of the file system in a scalable and fault-tolerant fashion. The authors also compare XtreemFS with existing solutions and argue for its practicability and potential in the cloud storage market.


2016 ◽  
pp. 278-301
Author(s):  
Anargyros Tsadimas ◽  
Mara Nikolaidou ◽  
Dimosthenis Anagnostopoulos

Model-based system design is served by a single, multi-layered model supporting all design activities, in different levels of detail. SysML is a modeling language, endorsed by OMG, for system engineering, which aims at defining such models for system design. It provides discrete diagrams to describe system structure and components, to explore allocation policies crucial for system design, and to identify design requirements. In this chapter, SysML is used for the model-based design of enterprise information system architecture, supporting a systemic view of such systems, where software and hardware entities are treated as system components composed to create the system architecture. SysML extensions to facilitate the effective description of non-functional requirements, especially quantitative ones, and their verification are presented. The integration of evaluation parameters and results into a discrete SysML diagram enhances the requirement verification process, while the visualization of evaluation data helps system engineers to explore design decisions and properly adjust system design. Based on the proposed extensions, a SysML profile is developed. The experience obtained when applying the profile for renovating the architecture of a large-scale enterprise information system is also briefly discussed to explore the potential of the proposed extensions.


2021 ◽  
Vol 11 (8) ◽  
pp. 3298
Author(s):  
Jeong-Joon Kim

Various techniques have been used in distributed file systems for data availability and stability. Typically, a method for storing data in a replication technique-based distributed file system is used, but due to the problem of space efficiency, an erasure-coding (EC) technique has been utilized more recently. The EC technique improves the space efficiency problem more than the replication technique does. However, the EC technique has various performance degradation factors, such as encoding and decoding and input and output (I/O) degradation. Thus, this study proposes a buffering and combining technique in which various I/O requests that occurred during encoding in an EC-based distributed file system are combined into one and processed. In addition, it proposes four recovery measures (disk input/output load distribution, random block layout, multi-thread-based parallel recovery, and matrix recycle technique) to distribute the disk input/output loads generated during decoding.


Sign in / Sign up

Export Citation Format

Share Document