Block-Split Array Coding Algorithm for Long-Stream Data Compression

Journal of Sensors ◽

10.1155/2020/5726527 ◽

2020 ◽

Vol 2020 ◽

pp. 1-22

Author(s):

Qin Jiancheng ◽

Lu Yiqin ◽

Zhong Yu

Keyword(s):

Big Data ◽

Data Compression ◽

Compression Ratio ◽

Large Scale ◽

Industrial Revolution ◽

Parallel Implementation ◽

Small Data ◽

Stream Data ◽

Network Bandwidth ◽

General Data

With the advent of IR (Industrial Revolution) 4.0, the spread of sensors in IoT (Internet of Things) may generate massive data, which will challenge the limited sensor storage and network bandwidth. Hence, the study of big data compression is valuable in the field of sensors. A problem is how to compress the long-stream data efficiently with the finite memory of a sensor. To maintain the performance, traditional techniques of compression have to treat the data streams on a small and incompetent scale, which will reduce the compression ratio. To solve this problem, this paper proposes a block-split coding algorithm named “CZ-Array algorithm,” and implements it in the shareware named “ComZip.” CZ-Array can use a relatively small data window to cover a configurable large scale, which benefits the compression ratio. It is fast with the time complexity O(N) and fits the big data compression. The experiment results indicate that ComZip with CZ-Array can obtain a better compression ratio than gzip, lz4, bzip2, and p7zip in the multiple stream data compression, and it also has a competent speed among these general data compression software. Besides, CZ-Array is concise and fits the hardware parallel implementation of sensors.

Download Full-text

Parallel Algorithm for Wireless Data Compression and Encryption

Journal of Sensors ◽

10.1155/2017/4209397 ◽

2017 ◽

Vol 2017 ◽

pp. 1-11 ◽

Cited By ~ 5

Author(s):

Qin Jiancheng ◽

Lu Yiqin ◽

Zhong Yu

Keyword(s):

Big Data ◽

Internet Of Things ◽

Data Compression ◽

Parallel Algorithm ◽

Wireless Network ◽

Compression Ratio ◽

Wireless Data ◽

Limited Bandwidth ◽

Compression Speed ◽

Parallel Pipeline

As the wireless network has limited bandwidth and insecure shared media, the data compression and encryption are very useful for the broadcasting transportation of big data in IoT (Internet of Things). However, the traditional techniques of compression and encryption are neither competent nor efficient. In order to solve this problem, this paper presents a combined parallel algorithm named “CZ algorithm” which can compress and encrypt the big data efficiently. CZ algorithm uses a parallel pipeline, mixes the coding of compression and encryption, and supports the data window up to 1 TB (or larger). Moreover, CZ algorithm can encrypt the big data as a chaotic cryptosystem which will not decrease the compression speed. Meanwhile, a shareware named “ComZip” is developed based on CZ algorithm. The experiment results show that ComZip in 64 b system can get better compression ratio than WinRAR and 7-zip, and it can be faster than 7-zip in the big data compression. In addition, ComZip encrypts the big data without extra consumption of computing resources.

Download Full-text

Parallel implementation of large-scale CFD data compression toward aeroacoustic analysis

Computers & Fluids ◽

10.1016/j.compfluid.2012.04.020 ◽

2013 ◽

Vol 80 ◽

pp. 116-127 ◽

Cited By ~ 8

Author(s):

Ryotaro Sakai ◽

Daisuke Sasaki ◽

Kazuhiro Nakahashi

Keyword(s):

Data Compression ◽

Large Scale ◽

Parallel Implementation

Download Full-text

Towards Big Linked Data

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2013100102 ◽

2013 ◽

Vol 9 (4) ◽

pp. 19-43 ◽

Cited By ~ 1

Author(s):

Bo Hu ◽

Nuno Carvalho ◽

Takahide Matsutsuka

Keyword(s):

Big Data ◽

Large Scale ◽

Stress Test ◽

Open Data ◽

Small Data ◽

Data Sets ◽

And Performance ◽

Machine Readable ◽

Future Work ◽

Semantic Layer

In light of the challenges of effectively managing Big Data, the authors are witnessing a gradual shift towards the increasingly popular Linked Open Data (LOD) paradigm. LOD aims to impose a machine-readable semantic layer over structured as well as unstructured data and hence automate some data analysis tasks that are not designed for computers. The convergence of Big Data and LOD is, however, not straightforward: the semantic layer of LOD and the Big Data large scale storage do not get along easily. Meanwhile, the sheer data size envisioned by Big Data denies certain computationally expensive semantic technologies, rendering the latter much less efficient than their performance on relatively small data sets. In this paper, the authors propose a mechanism allowing LOD to take advantage of existing large-scale data stores while sustaining its “semantic” nature. The authors demonstrate how RDF-based semantic models can be distributed across multiple storage servers and the authors examine how a fundamental semantic operation can be tuned to meet the requirements on distributed and parallel data processing. The authors' future work will focus on stress test of the platform in the magnitude of tens of billions of triples, as well as comparative studies in usability and performance against similar offerings.

Download Full-text

Big Data Compression Technology Based on Internet of Vehicles

International Journal of Online and Biomedical Engineering (iJOE) ◽

10.3991/ijoe.v15i01.9773 ◽

2019 ◽

Vol 15 (01) ◽

pp. 85

Author(s):

Guohua Xiong

Keyword(s):

Big Data ◽

Data Compression ◽

Compression Ratio ◽

High Efficiency ◽

Large Data ◽

Internet Of Vehicles ◽

Fixed Threshold ◽

Data Volume ◽

Networking Technology ◽

Vehicle Networking

To ensure the high efficiency of the development of car networking technology, large data compression technology based on car networking was studied. First, RFID technology and vehicle networking, big data technology in vehicle networking, RFID path data compression technology in the Internet of vehicles were introduced. Then, RFID path data compression verification experiments were performed. The results showed that when the data volume was relatively small, there was no obvious change in the compression ratio under the fixed threshold and the threshold change. However, when the amount of data gradually increased, the compression ratio under the condition of changing the threshold was slightly higher than the fixed threshold. Therefore, RFID path big data processing is feasible, and compression technology is efficient.

Download Full-text

Adaptive lossy compression of climate model data based on hierarchical tensor with Adaptive-HGFDR (v1.0)

10.5194/gmd-2020-124 ◽

2020 ◽

Author(s):

Zhaoyuan Yu ◽

Zhengfang Zhang ◽

Dongshuang Li ◽

Wen Luo ◽

Yuan Liu ◽

...

Keyword(s):

Data Compression ◽

Compression Ratio ◽

Large Scale ◽

Data Representation ◽

Lossy Compression ◽

Earth System Model ◽

System Model ◽

Earth System ◽

Model Data ◽

Compression Parameter

Abstract. Lossy compression has been applied to large-scale experimental model data compression due to its advantages of a high compression ratio. However, few methods consider the uneven distribution of compression errors affecting compression quality. Here we develop an adaptive lossy compression method with the stable compression error for earth system model data based on Hierarchical Geospatial Field Data Representation (HGFDR). We extended the original HGFDR by firstly dividing the original data into a series of the local block according to the exploratory experiment to maximize the local correlations of the data. After that, from the mathematical model of the HGFDR, the relationship between the compression parameter and compression error in HGFDR for each block is analyzed and calculated. Using optimal compression parameter selection rule and an adaptive compression algorithm, our method, the Adaptive-HGFDR, achieved the data compression under the constraints that the compression error is as stable as possible through each dimension. Experiments concerning model data compression are carried out based on the Community Earth System Model (CESM) data. The results show that our method has higher compression ratio and more uniform error distributions, compared with other commonly used lossy compression methods, such as the Fixed-Rate Compressed Floating-Point Arrays method.

Download Full-text

Indonesia Network Infrastructures and Workforce Adequacy to Implement Machine Learning for Large-Scale Manufacturing

International Journal of Artificial Intelligence ◽

10.36079/lamintang.ijai-0801.182 ◽

2021 ◽

Vol 8 (1) ◽

pp. 1-16

Author(s):

Steven Anderson ◽

Ansarullah Lawi

Keyword(s):

Machine Learning ◽

Big Data ◽

Large Scale ◽

Manufacturing Industry ◽

Industrial Revolution ◽

Technological Development ◽

Learning Technology ◽

Monitoring And Control ◽

Manufacturing Activity ◽

Industrial Monitoring

Technological development prior to industrial revolution 4.0 incentivized manufacturing industries to invest into digital industry with the aim of increasing the capability and efficiency in manufacturing activity. Major manufacturing industry has begun implementing cyber-physical system in industrial monitoring and control. The system itself will generate large volumes of data. The ability to process those big data requires algorithm called machine learning because of its ability to read patterns of big data for producing useful information. This study conducted on premises of Indonesia’s current network infrastructure and workforce capability on supporting the implementation of machine learning especially in large-scale manufacture. That will be compared with countries that have a positive stance in implementing machine learning in manufacturing. The conclusions that can be drawn from this research are Indonesia current infrastructure and workforce is still unable to fully support the implementation of machine learning technology in manufacturing industry and improvements are needed.

Download Full-text

Storage Preservation Using Big Data Based Intelligent Compression Scheme

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit19539 ◽

2019 ◽

pp. 92-100

Author(s):

Ramya. S ◽

Gokula Krishnan. V

Keyword(s):

Big Data ◽

Data Compression ◽

Data Storage ◽

Original Data ◽

Compression Scheme ◽

Fine Grained ◽

Network Bandwidth ◽

Similarity Algorithm ◽

Productive Phase ◽

Client Side

Big data has reached a maturity that leads it into a productive phase. This means that most of the main issues with big data have been addressed to a degree that storage has become interesting for full commercial exploitation. However, concerns over data compression still prevent many users from migrating data to remote storage. Client-side data compression in particular ensures that multiple uploads of the same content only consume network bandwidth and storage space of a single upload. Compression is actively used by a number of backup providers as well as various services. Unfortunately, compressed data is pseudorandom and thus cannot be deduplicated: as a consequence, current schemes have to entirely sacrifice storage efficiency. In this system, present a scheme that permits a more fine-grained trade-off. And present a novel idea that differentiates data according to their popularity. Based on this idea, design a compression scheme that guarantees semantic storage preservation for unpopular data and provides scalable data storage and bandwidth benefits for popular data. We can implement variable data chunk similarity algorithm for analyze the chunks data and store the original data with compressed format. And also includes the encryption algorithm to secure the data. Finally, can use the backup recover system at the time of blocking and also analyze frequent login access system.

Download Full-text

Fast Algorithm of Truncated Burrows-Wheeler Transform Coding for Data Compression of Sensors

Journal of Sensors ◽

10.1155/2018/6908760 ◽

2018 ◽

Vol 2018 ◽

pp. 1-17 ◽

Cited By ~ 2

Author(s):

Qin Jiancheng ◽

Lu Yiqin ◽

Zhong Yu

Keyword(s):

Big Data ◽

Data Compression ◽

Fast Algorithm ◽

Block Size ◽

Suffix Array ◽

Transform Coding ◽

Massive Data ◽

Network Bandwidth ◽

Burrows Wheeler Transform ◽

Good Compression

Lots of sensors in the IoT (Internet of things) may generate massive data, which will challenge the limited sensor storage and network bandwidth. So the study of big data compression is very useful in the field of sensors. In practice, BWT (Burrows-Wheeler transform) can gain good compression results for some kinds of data, but the traditional BWT algorithms are neither concise nor fast enough for the hardware of sensors, which will limit the BWT block size in a very small and incompetent scale. To solve this problem, this paper presents a fast algorithm of truncated BWT named “CZ-BWT algorithm” and implements it in the shareware named “ComZip.” CZ-BWT supports the BWT block up to 2 GB (or larger) and uses the bucket sort. It is very fast with the time complexity O(N) and fits the big data compression. The experiment results indicate that ComZip with the CZ-BWT filter is obviously faster than bzip2, and it can obtain better compression ratio than bzip2 and p7zip in some conditions. In addition, CZ-BWT is more concise than current BWT with SA (suffix array) sorts and fits the hardware BWT implementation of sensors.

Download Full-text

A parallel and automatically tuned algorithm for multispectral image deconvolution

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/stz2193 ◽

2019 ◽

Vol 490 (1) ◽

pp. 37-49

Author(s):

R Ammanouil ◽

A Ferrari ◽

D Mary ◽

C Ferrari ◽

F Loi

Keyword(s):

Big Data ◽

Image Reconstruction ◽

Large Scale ◽

Parallel Implementation ◽

Wide Band ◽

Reconstruction Algorithm ◽

Intermediate Frequency ◽

Image Deconvolution ◽

Reconstruction Algorithms ◽

Big Data Applications

ABSTRACT In the era of big data, radio astronomical image reconstruction algorithms are challenged to estimate clean images given limited computing resources and time. This article is driven by the need for large-scale image reconstruction for the future Square Kilometre Array (SKA), which will become in the next decades the largest low and intermediate frequency radio telescope in the world. This work proposes a scalable wide-band deconvolution algorithm called MUFFIN, which stands for ‘MUlti Frequency image reconstruction For radio INterferometry’. MUFFIN estimates the sky images in various frequency bands, given the corresponding dirty images and point spread functions. The reconstruction is achieved by minimizing a data fidelity term and joint spatial and spectral sparse analysis regularization terms. It is consequently non-parametric w.r.t. the spectral behaviour of radio sources. MUFFIN algorithm is endowed with a parallel implementation and an automatic tuning of the regularization parameters, making it scalable and well suited for big data applications such as SKA. Comparisons between MUFFIN and the state-of-the-art wide-band reconstruction algorithm are provided.

Download Full-text

A Scalable Big Data Framework for Real-Time Traffic Monitoring System

10.21203/rs.3.rs-1200646/v1 ◽

2022 ◽

Author(s):

Wilfried Yves Hamilton Adoni ◽

Tarik Nahhal ◽

Najib Ben Aoun ◽

Moez Krichen ◽

Mohammed Alzahrani

Keyword(s):

Big Data ◽

Real Time ◽

Public Transportation ◽

Large Scale ◽

Network Performance ◽

Intelligent Transportation System ◽

Traffic Monitoring ◽

Transportation System ◽

Stream Data ◽

Data Framework

Abstract In this paper, we present a scalable and real-time intelligent transportation system based on a big data framework. The proposed system allows for the use of existing data from road sensors to better understand traffic flow, traveler behavior, and increase road network performance. Our transportation system is designed to process large-scale stream data to analyze traffic events such as incidents, crashes and congestion. The experiments performed on the public transportation modes of the city of Casablanca in Morocco reveal that the proposed system achieves a significant gain of time, gathers large-scale data from many road sensors and is not expensive in terms of hardware resource consumption.

Download Full-text