Analyzing the Performance of the S3 Object Storage API for HPC Workloads

Frank Gadban; Julian Kunkel

doi:10.3390/app11188540

Analyzing the Performance of the S3 Object Storage API for HPC Workloads

Applied Sciences ◽

10.3390/app11188540 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8540

Author(s):

Frank Gadban ◽

Julian Kunkel

Keyword(s):

Best Practices ◽

Performance Optimization ◽

Cloud Storage ◽

High Performance ◽

File Systems ◽

Performance Loss ◽

Object Storage ◽

Storage Service ◽

Main Driver ◽

Access Patterns

The line between HPC and Cloud is getting blurry: Performance is still the main driver in HPC, while cloud storage systems are assumed to offer low latency, high throughput, high availability, and scalability. The Simple Storage Service S3 has emerged as the de facto storage API for object storage in the Cloud. This paper seeks to check if the S3 API is already a viable alternative for HPC access patterns in terms of performance or if further performance advancements are necessary. For this purpose: (a) We extend two common HPC I/O benchmarks—the IO500 and MD-Workbench—to quantify the performance of the S3 API. We perform the analysis on the Mistral supercomputer by launching the enhanced benchmarks against different S3 implementations: on-premises (Swift, MinIO) and in the Cloud (Google, IBM…). We find that these implementations do not yet meet the demanding performance and scalability expectations of HPC workloads. (b) We aim to identify the cause for the performance loss by systematically replacing parts of a popular S3 client library with lightweight replacements of lower stack components. The created S3Embedded library is highly scalable and leverages the shared cluster file systems of HPC infrastructure to accommodate arbitrary S3 client applications. Another introduced library, S3remote, uses TCP/IP for communication instead of HTTP; it provides a single local S3 gateway on each node. By broadening the scope of the IO500, this research enables the community to track the performance growth of S3 and encourage sharing best practices for performance optimization. The analysis also proves that there can be a performance convergence—at the storage level—between Cloud and HPC over time by using a high-performance S3 library like S3Embedded.

Download Full-text

A Data Management in a Private Cloud Storage Environment Utilizing High Performance Distributed File Systems

2013 Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises ◽

10.1109/wetice.2013.12 ◽

2013 ◽

Cited By ~ 7

Author(s):

Tiago S. Soares ◽

M.A.R. Dantas ◽

Douglas D.J. de Macedo ◽

Michael A. Bauer

Keyword(s):

Data Management ◽

Cloud Storage ◽

High Performance ◽

File Systems ◽

Distributed File Systems ◽

Private Cloud ◽

Cloud Storage Environment

Download Full-text

Requirements for a Forensically Ready Cloud Storage Service

International Journal of Digital Crime and Forensics ◽

10.4018/jdcf.2011070102 ◽

2011 ◽

Vol 3 (3) ◽

pp. 19-36 ◽

Cited By ~ 12

Author(s):

Theodoros Spyridopoulos ◽

Vasilios Katos

Keyword(s):

Cloud Storage ◽

File System ◽

Storage Systems ◽

File Systems ◽

Distributed File System ◽

Distributed File Systems ◽

Cloud Storage Service ◽

Storage Service ◽

Acquisition Processes

This paper examines the feasibility of developing a forensic acquisition tool in a distributed file system. Using GFS and KFS distributed file systems as vehicles and through representative scenarios and examples, the authors develop forensic acquisition processes and examine both the requirements of the tool and the distributed file system must meet in order to facilitate the acquisition. The authors conclude that cloud storage has features that can be leveraged to perform acquisition (such as redundancy and replication triggers) but also maintains a complexity, which is higher than traditional storage systems leading to a need for forensic-readiness-by-design.

Download Full-text

Filesystem and object storage for climate data analytics in private clouds with OpenStack

10.5194/egusphere-egu2020-19280 ◽

2020 ◽

Author(s):

Ezequiel Cimadevilla Alvarez ◽

Aida Palacio Hoz ◽

Antonio S. Cofiño ◽

Alvaro Lopez Garcia

Keyword(s):

Data Analysis ◽

Data Storage ◽

Data Analytics ◽

Cloud Storage ◽

File System ◽

File Systems ◽

Climate Data ◽

Cloud Infrastructure ◽

Data Formats ◽

Object Storage

Data analysis in climate science has been traditionally performed in two different environments, local workstations and HPC infrastructures. Local workstations provide a non scalable environment in which data analysis is restricted to small datasets that are previously downloaded. On the other hand, HPC infrastructures provide high computation capabilities by making use of parallel file systems and libraries that allow to scale data analysis. Due to the great increase in the size of the datasets and the need to provide computation environments close to data storage, data providers are evaluating the use of commercial clouds as an alternative for data storage. Examples of commercial clouds are Google Cloud Storage and Amazon S3, although cloud storage is not restricted to commercial clouds since several institutions provide private or hybrid clouds. These providers use systems known as &#8220;object storage&#8221; in order to provide cloud storage, since they offer great scalability and storage capacity compared to POSIX file systems found in local or HPC infrastructures.Cloud storage systems, based on object storage, are incompatible with existing libraries and data formats used by climate community to store and analyse data. Legacy libraries and data formats include netCDF and HDF5, which assume the underlying storage is a file system and it&#8217;s not an object store. However, new libraries such as Zarr try to solve the problem of storing multidimensional arrays both in file systems and object stores.In this work we present a private cloud infrastructure built upon OpenStack which provides both file system and object storage. The infrastructure also provides an environment, based on JupyterHub, to perform&#160; remote data analysis, close to the data. This has some advantages from users perspective. First, users are no required to deploy the required software and tools for the analysis. Second, it provides a remote environment where users can perform scalable data analytics. And third, there is no constraint to download huge amounts of data, to users local computer, before running the analysis of the data.

Download Full-text

Data Integrity Techniques in Cloud Computing: An Analysis

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i8.36 ◽

2017 ◽

Vol 7 (8) ◽

pp. 121 ◽

Cited By ~ 1

Author(s):

Neha Thakur ◽

Aman Kumar Sharma

Keyword(s):

Cloud Computing ◽

Cloud Storage ◽

Data Integrity ◽

Advantages And Disadvantages ◽

Security Models ◽

Cloud Storage Service ◽

Storage Service ◽

Cloud Users ◽

It Enterprises

Cloud computing has been envisioned as the definite and concerning solution to the rising storage costs of IT Enterprises. There are many cloud computing initiatives from IT giants such as Google, Amazon, Microsoft, IBM. Integrity monitoring is essential in cloud storage for the same reasons that data integrity is critical for any data centre. Data integrity is defined as the accuracy and consistency of stored data, in absence of any alteration to the data between two updates of a file or record. In order to ensure the integrity and availability of data in Cloud and enforce the quality of cloud storage service, efficient methods that enable on-demand data correctness verification on behalf of cloud users have to be designed. To overcome data integrity problem, many techniques are proposed under different systems and security models. This paper will focus on some of the integrity proving techniques in detail along with their advantages and disadvantages.

Download Full-text

A high performance private cloud storage system

International Workshop on Communication Technology 2013 ◽

10.2495/cecnet130511 ◽

2014 ◽

Author(s):

Wei Zhang

Keyword(s):

Cloud Storage ◽

High Performance ◽

Storage System ◽

Private Cloud

Download Full-text

Research on a High Performance Cloud Storage Architecture and its Coordination Algorithms

Proceedings of the 2019 International Conference on Artificial Intelligence and Computer Science - AICS 2019 ◽

10.1145/3349341.3349453 ◽

2019 ◽

Author(s):

Hairong Li ◽

Zhongchun Fang

Keyword(s):

Cloud Storage ◽

High Performance ◽

Coordination Algorithms

Download Full-text

A high performance redundancy scheme for cluster file systems

International Journal of High Performance Computing and Networking ◽

10.1504/ijhpcn.2004.008895 ◽

2004 ◽

Vol 2 (2/3/4) ◽

pp. 90 ◽

Cited By ~ 1

Author(s):

Manoj Pillai ◽

Mario Lauria

Keyword(s):

High Performance ◽

File Systems ◽

Cluster File

Download Full-text

Octopus + : An RDMA-Enabled Distributed Persistent Memory File System

ACM Transactions on Storage ◽

10.1145/3448418 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-25

Author(s):

Bohong Zhu ◽

Youmin Chen ◽

Qing Wang ◽

Youyou Lu ◽

Jiwu Shu

Keyword(s):

High Speed ◽

High Performance ◽

File System ◽

Direct Memory Access ◽

File Systems ◽

Distributed File Systems ◽

Persistent Memory ◽

Memory Modules ◽

Non Volatile Memory ◽

Volatile Memory

Non-volatile memory and remote direct memory access (RDMA) provide extremely high performance in storage and network hardware. However, existing distributed file systems strictly isolate file system and network layers, and the heavy layered software designs leave high-speed hardware under-exploited. In this article, we propose an RDMA-enabled distributed persistent memory file system, Octopus + , to redesign file system internal mechanisms by closely coupling non-volatile memory and RDMA features. For data operations, Octopus + directly accesses a shared persistent memory pool to reduce memory copying overhead, and actively fetches and pushes data all in clients to rebalance the load between the server and network. For metadata operations, Octopus + introduces self-identified remote procedure calls for immediate notification between file systems and networking, and an efficient distributed transaction mechanism for consistency. Octopus + is enabled with replication feature to provide better availability. Evaluations on Intel Optane DC Persistent Memory Modules show that Octopus + achieves nearly the raw bandwidth for large I/Os and orders of magnitude better performance than existing distributed file systems.

Download Full-text

What Make People Getting Charged Apps Instead of Free One?

Journal of Global Information Management ◽

10.4018/jgim.2016040104 ◽

2016 ◽

Vol 24 (2) ◽

pp. 57-74 ◽

Cited By ~ 2

Author(s):

Chen-Shu Wang ◽

Cheng-Yu Lai ◽

Shiang-Lin Lin

Keyword(s):

Willingness To Pay ◽

Mobile Applications ◽

Cloud Storage ◽

Perceived Value ◽

Product Knowledge ◽

Moderating Effects ◽

Perceived Service Quality ◽

Interesting Issue ◽

Cloud Storage Service ◽

Storage Service

In recent year, mobile devices have become an indispensable product in our daily life. Extensive amount of mobile applications (Apps) have been developed and used on these devices. Restated, in terms of the Apps future development and popularization, to understand why people have willingness to pay for use certain Apps has apparently became an important issue. However, there are various homogeneity Apps, which people can easily find some free succedaneum for use. Consequently, it would be an interesting issue to realize individual's intention to pay for use the Apps. In this study, the authors conducted a survey in Taiwan to realize individuals' willingness to pay for Cloud Storage Service (CSS), since CSS is one of the frequently adopted App for most mobile device users. The results show that both the perceived service quality and conformity positively affect the perceived value and then increases the user's willingness to pay indirectly. In addition, the findings also support that the users' product knowledge about CSS produce negative moderating effects on the perceived value and the willingness of pay.

Download Full-text