FRaZ: A Generic High-Fidelity Fixed-Ratio Lossy Compression Framework for Scientific Floating-point Data

Bit Grooming: Statistically accurate precision-preserving quantization with compression, evaluated in the netCDF Operators (NCO, v4.4.8+)

10.5194/gmd-2016-63 ◽

2016 ◽

Author(s):

Charles S. Zender

Keyword(s):

Data Storage ◽

Lossy Compression ◽

Floating Point ◽

Decimal Digit ◽

Double Precision ◽

Storage Space ◽

Climate Data ◽

Single Precision ◽

Precision Data ◽

Point Data

Abstract. Lossy compression schemes can help reduce the space required to store the false precision (i.e, scientifically meaningless data bits) that geoscientific models and measurements generate. We introduce, implement, and characterize a new lossy compression scheme suitable for IEEE floating-point data. Our new Bit Grooming algorithm alternately shaves (to zero) and sets (to one) the least significant bits of consecutive values to preserve a desired precision. This is a symmetric, two-sided variant of an algorithm sometimes called Bit Shaving which quantizes values solely by zeroing bits. Our variation eliminates the artificial low-bias produced by always zeroing bits, and makes Bit Grooming more suitable for arrays and multi-dimensional fields whose mean statistics are important. Bit Grooming relies on standard lossless compression schemes to achieve the actual reduction in storage space, so we tested Bit Grooming by applying the DEFLATE compression algorithm to bit-groomed and full-precision climate data stored in netCDF3, netCDF4, HDF4, and HDF5 formats. Bit Grooming reduces the storage space required by uncompressed and compressed climate data by up to 50 % and 20 %, respectively, for single-precision data (the most common case for climate data). When used aggressively (i.e., preserving only 1–3 decimal digits of precision), Bit Grooming produces storage reductions comparable to other quantization techniques such as linear packing. Unlike linear packing, Bit Grooming works on the full representable range of floating-point data. Bit Grooming reduces the volume of single-precision compressed data by roughly 10 % per decimal digit quantized (or "groomed") after the third such digit, up to a maximum reduction of about 50 %. The potential reduction is greater for double-precision datasets. Data quantization by Bit Grooming is irreversible (i.e., lossy) yet transparent, meaning that no extra processing is required by data users/readers. Hence Bit Grooming can easily reduce data storage volume without sacrificing scientific precision or imposing extra burdens on users.

Download Full-text

Use cases of lossy compression for floating-point data in scientific data sets

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019853336 ◽

2019 ◽

Vol 33 (6) ◽

pp. 1201-1220 ◽

Cited By ~ 10

Author(s):

Franck Cappello ◽

Sheng Di ◽

Sihuan Li ◽

Xin Liang ◽

Ali Murat Gok ◽

...

Keyword(s):

High Performance ◽

Scientific Computing ◽

Lossy Compression ◽

Scientific Data ◽

Use Cases ◽

Floating Point ◽

Data Sets ◽

Use Case ◽

Computing Systems ◽

Point Data

Architectural and technological trends of systems used for scientific computing call for a significant reduction of scientific data sets that are composed mainly of floating-point data. This article surveys and presents experimental results of currently identified use cases of generic lossy compression to address the different limitations of scientific computing systems. The article shows from a collection of experiments run on parallel systems of a leadership facility that lossy data compression not only can reduce the footprint of scientific data sets on storage but also can reduce I/O and checkpoint/restart times, accelerate computation, and even allow significantly larger problems to be run than without lossy compression. These results suggest that lossy compression will become an important technology in many aspects of high performance scientific computing. Because the constraints for each use case are different and often conflicting, this collection of results also indicates the need for more specialization of the compression pipelines.

Download Full-text

JPEG2000 Compliant Lossless Coding of Floating Point Data

Data Compression Conference ◽

10.1109/dcc.2005.49 ◽

2005 ◽

Cited By ~ 3

Author(s):

B. Usevitch

Keyword(s):

Floating Point ◽

Lossless Coding ◽

Point Data

Download Full-text

A Versatile Compression Method for Floating-Point Data Stream

2013 Fourth International Conference on Networking and Distributed Computing ◽

10.1109/icndc.2013.32 ◽

2013 ◽

Cited By ~ 2

Author(s):

Songbin Liu ◽

Xiaomeng Huang ◽

Yufang Ni ◽

Haohuan Fu ◽

Guangwen Yang

Keyword(s):

Data Stream ◽

Floating Point ◽

Compression Method ◽

Point Data

Download Full-text

A note on precision-preserving compression of scientific data

10.5194/gmd-2020-239 ◽

2020 ◽

Author(s):

Rostislav Kouznetsov

Keyword(s):

Numerical Data ◽

Lossy Compression ◽

Scientific Data ◽

Essential Information ◽

Network Bandwidth ◽

Simple Implementation ◽

Optimal Accuracy ◽

Point Data ◽

Multipoint Statistics ◽

And Storage

Abstract. Lossy compression of scientific data arrays is a powerful tool to save network bandwidth and storage space. Properly applied lossy compression can reduce the size of a dataset by orders of magnitude keeping all essential information, whereas a wrong choice of lossy compression parameters leads to the loss of valuable data. The paper considers statistical properties of several lossy compression methods implemented in "NetCDF operators" (NCO), a popular tool for handling and transformation of numerical data in NetCDF format. We compare the effects of imprecisions and artifacts resulting from use of a lossy compression of floating-point data arrays. In particular, we show that a popular Bit Grooming algorithm (default in NCO) has sub-optimal accuracy and produces substantial artifacts in multipoint statistics. We suggest a simple implementation of two algorithms that are free from these artifacts and have twice higher precision. Besides that, we suggest a way to rectify the data already processed with Bit Grooming. The algorithm has been contributed to NCO mainstream. The supplementary material contains the implementation of the algorithm in Python 3.

Download Full-text

Generating Test Data Using Symbolic Execution: Challenges with Floating Point Data Types

Communications in Computer and Information Science - Information and Software Technologies ◽

10.1007/978-3-642-33308-8_22 ◽

2012 ◽

pp. 267-274

Author(s):

Justinas Prelgauskas ◽

Eduardas Bareisa

Keyword(s):

Test Data ◽

Symbolic Execution ◽

Floating Point ◽

Data Types ◽

Point Data

Download Full-text

Read-Write Operation on Floating Point Data Program Design Between MCU and KingView

Lecture Notes in Electrical Engineering - Proceedings of the 9th International Symposium on Linear Drives for Industry Applications, Volume 3 ◽

10.1007/978-3-642-40633-1_89 ◽

2013 ◽

pp. 717-723

Author(s):

Congcong Fang ◽

Xiaojing Yang

Keyword(s):

Program Design ◽

Floating Point ◽

Point Data ◽

Data Program

Download Full-text

LFZip: Lossy Compression of Multivariate Floating-Point Time Series Data via Improved Prediction

2020 Data Compression Conference (DCC) ◽

10.1109/dcc47342.2020.00042 ◽

2020 ◽

Author(s):

Shubham Chandak ◽

Kedar Tatwawadi ◽

Chengtao Wen ◽

Lingyun Wang ◽

Juan Aparicio Ojea ◽

...

Keyword(s):

Time Series ◽

Time Series Data ◽

Lossy Compression ◽

Floating Point ◽

Series Data

Download Full-text

Lossless Compression of Double-Precision Floating-Point Data for Numerical Simulations: Highly Parallelizable Algorithms for GPU Computing

IEICE Transactions on Information and Systems ◽

10.1587/transinf.e95.d.2778 ◽

2012 ◽

Vol E95.D (12) ◽

pp. 2778-2786

Author(s):

Mamoru OHARA ◽

Takashi YAMAGUCHI

Keyword(s):

Numerical Simulations ◽

Gpu Computing ◽

Lossless Compression ◽

Floating Point ◽

Double Precision ◽

Point Data

Download Full-text

Compression of Barotropic Turbulence Simulation Data Using Wavelet-Based Lossy Coding

Volume 1: Fora, Parts A and B ◽

10.1115/fedsm2002-31120 ◽

2002 ◽

Cited By ~ 2

Author(s):

John P. Wilson

Keyword(s):

Evaluation Criteria ◽

Floating Point ◽

Simulation Data ◽

Single Precision ◽

Acceptable Error ◽

Turbulence Simulation ◽

Lossy Coding ◽

Point Data ◽

Definition Of ◽

Quantities Of Interest

Single-precision floating point data from a simulation of barotropic turbulence is compressed with a wavelet-based method. The quantity being compressed is vorticity. The compression error is evaluated both in terms of error in the vorticity and the error in various quantities derived from the vorticity. Numerical error is evaluated in all quantities and visualizations of the vorticity and correlation of the error with the uncompressed data are evaluated. It is found that depending on the quantities of interest and the evaluation criteria, compression ratios of 4:1 to 256:1 are achievable. Under a conservative definition of acceptable error, it is possible to recover quantities of interest from data compressed 4:1 (8bpp), the data rate that in existing practice is used for visualization.

Download Full-text