scholarly journals Analyzing the Performance of the S3 Object Storage API for HPC Workloads

2021 ◽  
Vol 11 (18) ◽  
pp. 8540
Author(s):  
Frank Gadban ◽  
Julian Kunkel

The line between HPC and Cloud is getting blurry: Performance is still the main driver in HPC, while cloud storage systems are assumed to offer low latency, high throughput, high availability, and scalability. The Simple Storage Service S3 has emerged as the de facto storage API for object storage in the Cloud. This paper seeks to check if the S3 API is already a viable alternative for HPC access patterns in terms of performance or if further performance advancements are necessary. For this purpose: (a) We extend two common HPC I/O benchmarks—the IO500 and MD-Workbench—to quantify the performance of the S3 API. We perform the analysis on the Mistral supercomputer by launching the enhanced benchmarks against different S3 implementations: on-premises (Swift, MinIO) and in the Cloud (Google, IBM…). We find that these implementations do not yet meet the demanding performance and scalability expectations of HPC workloads. (b) We aim to identify the cause for the performance loss by systematically replacing parts of a popular S3 client library with lightweight replacements of lower stack components. The created S3Embedded library is highly scalable and leverages the shared cluster file systems of HPC infrastructure to accommodate arbitrary S3 client applications. Another introduced library, S3remote, uses TCP/IP for communication instead of HTTP; it provides a single local S3 gateway on each node. By broadening the scope of the IO500, this research enables the community to track the performance growth of S3 and encourage sharing best practices for performance optimization. The analysis also proves that there can be a performance convergence—at the storage level—between Cloud and HPC over time by using a high-performance S3 library like S3Embedded.

2011 ◽  
Vol 3 (3) ◽  
pp. 19-36 ◽  
Author(s):  
Theodoros Spyridopoulos ◽  
Vasilios Katos

This paper examines the feasibility of developing a forensic acquisition tool in a distributed file system. Using GFS and KFS distributed file systems as vehicles and through representative scenarios and examples, the authors develop forensic acquisition processes and examine both the requirements of the tool and the distributed file system must meet in order to facilitate the acquisition. The authors conclude that cloud storage has features that can be leveraged to perform acquisition (such as redundancy and replication triggers) but also maintains a complexity, which is higher than traditional storage systems leading to a need for forensic-readiness-by-design.


2020 ◽  
Author(s):  
Ezequiel Cimadevilla Alvarez ◽  
Aida Palacio Hoz ◽  
Antonio S. Cofiño ◽  
Alvaro Lopez Garcia

<p>Data analysis in climate science has been traditionally performed in two different environments, local workstations and HPC infrastructures. Local workstations provide a non scalable environment in which data analysis is restricted to small datasets that are previously downloaded. On the other hand, HPC infrastructures provide high computation capabilities by making use of parallel file systems and libraries that allow to scale data analysis. Due to the great increase in the size of the datasets and the need to provide computation environments close to data storage, data providers are evaluating the use of commercial clouds as an alternative for data storage. Examples of commercial clouds are Google Cloud Storage and Amazon S3, although cloud storage is not restricted to commercial clouds since several institutions provide private or hybrid clouds. These providers use systems known as “object storage” in order to provide cloud storage, since they offer great scalability and storage capacity compared to POSIX file systems found in local or HPC infrastructures.</p><p>Cloud storage systems, based on object storage, are incompatible with existing libraries and data formats used by climate community to store and analyse data. Legacy libraries and data formats include netCDF and HDF5, which assume the underlying storage is a file system and it’s not an object store. However, new libraries such as Zarr try to solve the problem of storing multidimensional arrays both in file systems and object stores.</p><p>In this work we present a private cloud infrastructure built upon OpenStack which provides both file system and object storage. The infrastructure also provides an environment, based on JupyterHub, to perform  remote data analysis, close to the data. This has some advantages from users perspective. First, users are no required to deploy the required software and tools for the analysis. Second, it provides a remote environment where users can perform scalable data analytics. And third, there is no constraint to download huge amounts of data, to users local computer, before running the analysis of the data.</p>


Author(s):  
Neha Thakur ◽  
Aman Kumar Sharma

Cloud computing has been envisioned as the definite and concerning solution to the rising storage costs of IT Enterprises. There are many cloud computing initiatives from IT giants such as Google, Amazon, Microsoft, IBM. Integrity monitoring is essential in cloud storage for the same reasons that data integrity is critical for any data centre. Data integrity is defined as the accuracy and consistency of stored data, in absence of any alteration to the data between two updates of a file or record.  In order to ensure the integrity and availability of data in Cloud and enforce the quality of cloud storage service, efficient methods that enable on-demand data correctness verification on behalf of cloud users have to be designed. To overcome data integrity problem, many techniques are proposed under different systems and security models. This paper will focus on some of the integrity proving techniques in detail along with their advantages and disadvantages.


2021 ◽  
Vol 17 (3) ◽  
pp. 1-25
Author(s):  
Bohong Zhu ◽  
Youmin Chen ◽  
Qing Wang ◽  
Youyou Lu ◽  
Jiwu Shu

Non-volatile memory and remote direct memory access (RDMA) provide extremely high performance in storage and network hardware. However, existing distributed file systems strictly isolate file system and network layers, and the heavy layered software designs leave high-speed hardware under-exploited. In this article, we propose an RDMA-enabled distributed persistent memory file system, Octopus + , to redesign file system internal mechanisms by closely coupling non-volatile memory and RDMA features. For data operations, Octopus + directly accesses a shared persistent memory pool to reduce memory copying overhead, and actively fetches and pushes data all in clients to rebalance the load between the server and network. For metadata operations, Octopus + introduces self-identified remote procedure calls for immediate notification between file systems and networking, and an efficient distributed transaction mechanism for consistency. Octopus + is enabled with replication feature to provide better availability. Evaluations on Intel Optane DC Persistent Memory Modules show that Octopus + achieves nearly the raw bandwidth for large I/Os and orders of magnitude better performance than existing distributed file systems.


2016 ◽  
Vol 24 (2) ◽  
pp. 57-74 ◽  
Author(s):  
Chen-Shu Wang ◽  
Cheng-Yu Lai ◽  
Shiang-Lin Lin

In recent year, mobile devices have become an indispensable product in our daily life. Extensive amount of mobile applications (Apps) have been developed and used on these devices. Restated, in terms of the Apps future development and popularization, to understand why people have willingness to pay for use certain Apps has apparently became an important issue. However, there are various homogeneity Apps, which people can easily find some free succedaneum for use. Consequently, it would be an interesting issue to realize individual's intention to pay for use the Apps. In this study, the authors conducted a survey in Taiwan to realize individuals' willingness to pay for Cloud Storage Service (CSS), since CSS is one of the frequently adopted App for most mobile device users. The results show that both the perceived service quality and conformity positively affect the perceived value and then increases the user's willingness to pay indirectly. In addition, the findings also support that the users' product knowledge about CSS produce negative moderating effects on the perceived value and the willingness of pay.


2013 ◽  
Vol 33 (2) ◽  
pp. 39-50 ◽  
Author(s):  
Ketsaraporn Suttapong ◽  
Suwit Srimai ◽  
Pongsakorn Pitchayadol

Sign in / Sign up

Export Citation Format

Share Document