scholarly journals Block-Split Array Coding Algorithm for Long-Stream Data Compression

2020 ◽  
Vol 2020 ◽  
pp. 1-22
Author(s):  
Qin Jiancheng ◽  
Lu Yiqin ◽  
Zhong Yu

With the advent of IR (Industrial Revolution) 4.0, the spread of sensors in IoT (Internet of Things) may generate massive data, which will challenge the limited sensor storage and network bandwidth. Hence, the study of big data compression is valuable in the field of sensors. A problem is how to compress the long-stream data efficiently with the finite memory of a sensor. To maintain the performance, traditional techniques of compression have to treat the data streams on a small and incompetent scale, which will reduce the compression ratio. To solve this problem, this paper proposes a block-split coding algorithm named “CZ-Array algorithm,” and implements it in the shareware named “ComZip.” CZ-Array can use a relatively small data window to cover a configurable large scale, which benefits the compression ratio. It is fast with the time complexity O(N) and fits the big data compression. The experiment results indicate that ComZip with CZ-Array can obtain a better compression ratio than gzip, lz4, bzip2, and p7zip in the multiple stream data compression, and it also has a competent speed among these general data compression software. Besides, CZ-Array is concise and fits the hardware parallel implementation of sensors.

2017 ◽  
Vol 2017 ◽  
pp. 1-11 ◽  
Author(s):  
Qin Jiancheng ◽  
Lu Yiqin ◽  
Zhong Yu

As the wireless network has limited bandwidth and insecure shared media, the data compression and encryption are very useful for the broadcasting transportation of big data in IoT (Internet of Things). However, the traditional techniques of compression and encryption are neither competent nor efficient. In order to solve this problem, this paper presents a combined parallel algorithm named “CZ algorithm” which can compress and encrypt the big data efficiently. CZ algorithm uses a parallel pipeline, mixes the coding of compression and encryption, and supports the data window up to 1 TB (or larger). Moreover, CZ algorithm can encrypt the big data as a chaotic cryptosystem which will not decrease the compression speed. Meanwhile, a shareware named “ComZip” is developed based on CZ algorithm. The experiment results show that ComZip in 64 b system can get better compression ratio than WinRAR and 7-zip, and it can be faster than 7-zip in the big data compression. In addition, ComZip encrypts the big data without extra consumption of computing resources.


2013 ◽  
Vol 80 ◽  
pp. 116-127 ◽  
Author(s):  
Ryotaro Sakai ◽  
Daisuke Sasaki ◽  
Kazuhiro Nakahashi

2013 ◽  
Vol 9 (4) ◽  
pp. 19-43 ◽  
Author(s):  
Bo Hu ◽  
Nuno Carvalho ◽  
Takahide Matsutsuka

In light of the challenges of effectively managing Big Data, the authors are witnessing a gradual shift towards the increasingly popular Linked Open Data (LOD) paradigm. LOD aims to impose a machine-readable semantic layer over structured as well as unstructured data and hence automate some data analysis tasks that are not designed for computers. The convergence of Big Data and LOD is, however, not straightforward: the semantic layer of LOD and the Big Data large scale storage do not get along easily. Meanwhile, the sheer data size envisioned by Big Data denies certain computationally expensive semantic technologies, rendering the latter much less efficient than their performance on relatively small data sets. In this paper, the authors propose a mechanism allowing LOD to take advantage of existing large-scale data stores while sustaining its “semantic” nature. The authors demonstrate how RDF-based semantic models can be distributed across multiple storage servers and the authors examine how a fundamental semantic operation can be tuned to meet the requirements on distributed and parallel data processing. The authors' future work will focus on stress test of the platform in the magnitude of tens of billions of triples, as well as comparative studies in usability and performance against similar offerings.


Author(s):  
Guohua Xiong

To ensure the high efficiency of the development of car networking technology, large data compression technology based on car networking was studied. First, RFID technology and vehicle networking, big data technology in vehicle networking, RFID path data compression technology in the Internet of vehicles were introduced. Then, RFID path data compression verification experiments were performed. The results showed that when the data volume was relatively small, there was no obvious change in the compression ratio under the fixed threshold and the threshold change. However, when the amount of data gradually increased, the compression ratio under the condition of changing the threshold was slightly higher than the fixed threshold. Therefore, RFID path big data processing is feasible, and compression technology is efficient.


2020 ◽  
Author(s):  
Zhaoyuan Yu ◽  
Zhengfang Zhang ◽  
Dongshuang Li ◽  
Wen Luo ◽  
Yuan Liu ◽  
...  

Abstract. Lossy compression has been applied to large-scale experimental model data compression due to its advantages of a high compression ratio. However, few methods consider the uneven distribution of compression errors affecting compression quality. Here we develop an adaptive lossy compression method with the stable compression error for earth system model data based on Hierarchical Geospatial Field Data Representation (HGFDR). We extended the original HGFDR by firstly dividing the original data into a series of the local block according to the exploratory experiment to maximize the local correlations of the data. After that, from the mathematical model of the HGFDR, the relationship between the compression parameter and compression error in HGFDR for each block is analyzed and calculated. Using optimal compression parameter selection rule and an adaptive compression algorithm, our method, the Adaptive-HGFDR, achieved the data compression under the constraints that the compression error is as stable as possible through each dimension. Experiments concerning model data compression are carried out based on the Community Earth System Model (CESM) data. The results show that our method has higher compression ratio and more uniform error distributions, compared with other commonly used lossy compression methods, such as the Fixed-Rate Compressed Floating-Point Arrays method.


2021 ◽  
Vol 8 (1) ◽  
pp. 1-16
Author(s):  
Steven Anderson ◽  
Ansarullah Lawi

Technological development prior to industrial revolution 4.0 incentivized manufacturing industries to invest into digital industry with the aim of increasing the capability and efficiency in manufacturing activity. Major manufacturing industry has begun implementing cyber-physical system in industrial monitoring and control. The system itself will generate large volumes of data. The ability to process those big data requires algorithm called machine learning because of its ability to read patterns of big data for producing useful information. This study conducted on premises of Indonesia’s current network infrastructure and workforce capability on supporting the implementation of machine learning especially in large-scale manufacture. That will be compared with countries that have a positive stance in implementing machine learning in manufacturing. The conclusions that can be drawn from this research are Indonesia current infrastructure and workforce is still unable to fully support the implementation of machine learning technology in manufacturing industry and improvements are needed.


Author(s):  
Ramya. S ◽  
Gokula Krishnan. V

Big data has reached a maturity that leads it into a productive phase. This means that most of the main issues with big data have been addressed to a degree that storage has become interesting for full commercial exploitation. However, concerns over data compression still prevent many users from migrating data to remote storage. Client-side data compression in particular ensures that multiple uploads of the same content only consume network bandwidth and storage space of a single upload. Compression is actively used by a number of backup providers as well as various services. Unfortunately, compressed data is pseudorandom and thus cannot be deduplicated: as a consequence, current schemes have to entirely sacrifice storage efficiency. In this system, present a scheme that permits a more fine-grained trade-off. And present a novel idea that differentiates data according to their popularity. Based on this idea, design a compression scheme that guarantees semantic storage preservation for unpopular data and provides scalable data storage and bandwidth benefits for popular data. We can implement variable data chunk similarity algorithm for analyze the chunks data and store the original data with compressed format. And also includes the encryption algorithm to secure the data. Finally, can use the backup recover system at the time of blocking and also analyze frequent login access system.


2018 ◽  
Vol 2018 ◽  
pp. 1-17 ◽  
Author(s):  
Qin Jiancheng ◽  
Lu Yiqin ◽  
Zhong Yu

Lots of sensors in the IoT (Internet of things) may generate massive data, which will challenge the limited sensor storage and network bandwidth. So the study of big data compression is very useful in the field of sensors. In practice, BWT (Burrows-Wheeler transform) can gain good compression results for some kinds of data, but the traditional BWT algorithms are neither concise nor fast enough for the hardware of sensors, which will limit the BWT block size in a very small and incompetent scale. To solve this problem, this paper presents a fast algorithm of truncated BWT named “CZ-BWT algorithm” and implements it in the shareware named “ComZip.” CZ-BWT supports the BWT block up to 2 GB (or larger) and uses the bucket sort. It is very fast with the time complexity O(N) and fits the big data compression. The experiment results indicate that ComZip with the CZ-BWT filter is obviously faster than bzip2, and it can obtain better compression ratio than bzip2 and p7zip in some conditions. In addition, CZ-BWT is more concise than current BWT with SA (suffix array) sorts and fits the hardware BWT implementation of sensors.


2019 ◽  
Vol 490 (1) ◽  
pp. 37-49
Author(s):  
R Ammanouil ◽  
A Ferrari ◽  
D Mary ◽  
C Ferrari ◽  
F Loi

ABSTRACT In the era of big data, radio astronomical image reconstruction algorithms are challenged to estimate clean images given limited computing resources and time. This article is driven by the need for large-scale image reconstruction for the future Square Kilometre Array (SKA), which will become in the next decades the largest low and intermediate frequency radio telescope in the world. This work proposes a scalable wide-band deconvolution algorithm called MUFFIN, which stands for ‘MUlti Frequency image reconstruction For radio INterferometry’. MUFFIN estimates the sky images in various frequency bands, given the corresponding dirty images and point spread functions. The reconstruction is achieved by minimizing a data fidelity term and joint spatial and spectral sparse analysis regularization terms. It is consequently non-parametric w.r.t. the spectral behaviour of radio sources. MUFFIN algorithm is endowed with a parallel implementation and an automatic tuning of the regularization parameters, making it scalable and well suited for big data applications such as SKA. Comparisons between MUFFIN and the state-of-the-art wide-band reconstruction algorithm are provided.


2022 ◽  
Author(s):  
Wilfried Yves Hamilton Adoni ◽  
Tarik Nahhal ◽  
Najib Ben Aoun ◽  
Moez Krichen ◽  
Mohammed Alzahrani

Abstract In this paper, we present a scalable and real-time intelligent transportation system based on a big data framework. The proposed system allows for the use of existing data from road sensors to better understand traffic flow, traveler behavior, and increase road network performance. Our transportation system is designed to process large-scale stream data to analyze traffic events such as incidents, crashes and congestion. The experiments performed on the public transportation modes of the city of Casablanca in Morocco reveal that the proposed system achieves a significant gain of time, gathers large-scale data from many road sensors and is not expensive in terms of hardware resource consumption.


Sign in / Sign up

Export Citation Format

Share Document