scholarly journals FRaZ: A Generic High-Fidelity Fixed-Ratio Lossy Compression Framework for Scientific Floating-point Data

Author(s):  
Robert Underwood ◽  
Sheng Di ◽  
Jon C. Calhoun ◽  
Franck Cappello
2016 ◽  
Author(s):  
Charles S. Zender

Abstract. Lossy compression schemes can help reduce the space required to store the false precision (i.e, scientifically meaningless data bits) that geoscientific models and measurements generate. We introduce, implement, and characterize a new lossy compression scheme suitable for IEEE floating-point data. Our new Bit Grooming algorithm alternately shaves (to zero) and sets (to one) the least significant bits of consecutive values to preserve a desired precision. This is a symmetric, two-sided variant of an algorithm sometimes called Bit Shaving which quantizes values solely by zeroing bits. Our variation eliminates the artificial low-bias produced by always zeroing bits, and makes Bit Grooming more suitable for arrays and multi-dimensional fields whose mean statistics are important. Bit Grooming relies on standard lossless compression schemes to achieve the actual reduction in storage space, so we tested Bit Grooming by applying the DEFLATE compression algorithm to bit-groomed and full-precision climate data stored in netCDF3, netCDF4, HDF4, and HDF5 formats. Bit Grooming reduces the storage space required by uncompressed and compressed climate data by up to 50 % and 20 %, respectively, for single-precision data (the most common case for climate data). When used aggressively (i.e., preserving only 1–3 decimal digits of precision), Bit Grooming produces storage reductions comparable to other quantization techniques such as linear packing. Unlike linear packing, Bit Grooming works on the full representable range of floating-point data. Bit Grooming reduces the volume of single-precision compressed data by roughly 10 % per decimal digit quantized (or "groomed") after the third such digit, up to a maximum reduction of about 50 %. The potential reduction is greater for double-precision datasets. Data quantization by Bit Grooming is irreversible (i.e., lossy) yet transparent, meaning that no extra processing is required by data users/readers. Hence Bit Grooming can easily reduce data storage volume without sacrificing scientific precision or imposing extra burdens on users.


Author(s):  
Franck Cappello ◽  
Sheng Di ◽  
Sihuan Li ◽  
Xin Liang ◽  
Ali Murat Gok ◽  
...  

Architectural and technological trends of systems used for scientific computing call for a significant reduction of scientific data sets that are composed mainly of floating-point data. This article surveys and presents experimental results of currently identified use cases of generic lossy compression to address the different limitations of scientific computing systems. The article shows from a collection of experiments run on parallel systems of a leadership facility that lossy data compression not only can reduce the footprint of scientific data sets on storage but also can reduce I/O and checkpoint/restart times, accelerate computation, and even allow significantly larger problems to be run than without lossy compression. These results suggest that lossy compression will become an important technology in many aspects of high performance scientific computing. Because the constraints for each use case are different and often conflicting, this collection of results also indicates the need for more specialization of the compression pipelines.


2020 ◽  
Author(s):  
Rostislav Kouznetsov

Abstract. Lossy compression of scientific data arrays is a powerful tool to save network bandwidth and storage space. Properly applied lossy compression can reduce the size of a dataset by orders of magnitude keeping all essential information, whereas a wrong choice of lossy compression parameters leads to the loss of valuable data. The paper considers statistical properties of several lossy compression methods implemented in "NetCDF operators" (NCO), a popular tool for handling and transformation of numerical data in NetCDF format. We compare the effects of imprecisions and artifacts resulting from use of a lossy compression of floating-point data arrays. In particular, we show that a popular Bit Grooming algorithm (default in NCO) has sub-optimal accuracy and produces substantial artifacts in multipoint statistics. We suggest a simple implementation of two algorithms that are free from these artifacts and have twice higher precision. Besides that, we suggest a way to rectify the data already processed with Bit Grooming. The algorithm has been contributed to NCO mainstream. The supplementary material contains the implementation of the algorithm in Python 3.


Author(s):  
Shubham Chandak ◽  
Kedar Tatwawadi ◽  
Chengtao Wen ◽  
Lingyun Wang ◽  
Juan Aparicio Ojea ◽  
...  

Author(s):  
John P. Wilson

Single-precision floating point data from a simulation of barotropic turbulence is compressed with a wavelet-based method. The quantity being compressed is vorticity. The compression error is evaluated both in terms of error in the vorticity and the error in various quantities derived from the vorticity. Numerical error is evaluated in all quantities and visualizations of the vorticity and correlation of the error with the uncompressed data are evaluated. It is found that depending on the quantities of interest and the evaluation criteria, compression ratios of 4:1 to 256:1 are achievable. Under a conservative definition of acceptable error, it is possible to recover quantities of interest from data compressed 4:1 (8bpp), the data rate that in existing practice is used for visualization.


Sign in / Sign up

Export Citation Format

Share Document