scholarly journals Fast Algorithm of Truncated Burrows-Wheeler Transform Coding for Data Compression of Sensors

2018 ◽  
Vol 2018 ◽  
pp. 1-17 ◽  
Author(s):  
Qin Jiancheng ◽  
Lu Yiqin ◽  
Zhong Yu

Lots of sensors in the IoT (Internet of things) may generate massive data, which will challenge the limited sensor storage and network bandwidth. So the study of big data compression is very useful in the field of sensors. In practice, BWT (Burrows-Wheeler transform) can gain good compression results for some kinds of data, but the traditional BWT algorithms are neither concise nor fast enough for the hardware of sensors, which will limit the BWT block size in a very small and incompetent scale. To solve this problem, this paper presents a fast algorithm of truncated BWT named “CZ-BWT algorithm” and implements it in the shareware named “ComZip.” CZ-BWT supports the BWT block up to 2 GB (or larger) and uses the bucket sort. It is very fast with the time complexity O(N) and fits the big data compression. The experiment results indicate that ComZip with the CZ-BWT filter is obviously faster than bzip2, and it can obtain better compression ratio than bzip2 and p7zip in some conditions. In addition, CZ-BWT is more concise than current BWT with SA (suffix array) sorts and fits the hardware BWT implementation of sensors.

2020 ◽  
Vol 2020 ◽  
pp. 1-22
Author(s):  
Qin Jiancheng ◽  
Lu Yiqin ◽  
Zhong Yu

With the advent of IR (Industrial Revolution) 4.0, the spread of sensors in IoT (Internet of Things) may generate massive data, which will challenge the limited sensor storage and network bandwidth. Hence, the study of big data compression is valuable in the field of sensors. A problem is how to compress the long-stream data efficiently with the finite memory of a sensor. To maintain the performance, traditional techniques of compression have to treat the data streams on a small and incompetent scale, which will reduce the compression ratio. To solve this problem, this paper proposes a block-split coding algorithm named “CZ-Array algorithm,” and implements it in the shareware named “ComZip.” CZ-Array can use a relatively small data window to cover a configurable large scale, which benefits the compression ratio. It is fast with the time complexity O(N) and fits the big data compression. The experiment results indicate that ComZip with CZ-Array can obtain a better compression ratio than gzip, lz4, bzip2, and p7zip in the multiple stream data compression, and it also has a competent speed among these general data compression software. Besides, CZ-Array is concise and fits the hardware parallel implementation of sensors.


Author(s):  
Ramya. S ◽  
Gokula Krishnan. V

Big data has reached a maturity that leads it into a productive phase. This means that most of the main issues with big data have been addressed to a degree that storage has become interesting for full commercial exploitation. However, concerns over data compression still prevent many users from migrating data to remote storage. Client-side data compression in particular ensures that multiple uploads of the same content only consume network bandwidth and storage space of a single upload. Compression is actively used by a number of backup providers as well as various services. Unfortunately, compressed data is pseudorandom and thus cannot be deduplicated: as a consequence, current schemes have to entirely sacrifice storage efficiency. In this system, present a scheme that permits a more fine-grained trade-off. And present a novel idea that differentiates data according to their popularity. Based on this idea, design a compression scheme that guarantees semantic storage preservation for unpopular data and provides scalable data storage and bandwidth benefits for popular data. We can implement variable data chunk similarity algorithm for analyze the chunks data and store the original data with compressed format. And also includes the encryption algorithm to secure the data. Finally, can use the backup recover system at the time of blocking and also analyze frequent login access system.


Author(s):  
Yu Zhang ◽  
Yan-Ge Wang ◽  
Yan-Ping Bai ◽  
Yong-Zhen Li ◽  
Zhao-Yong Lv ◽  
...  

Author(s):  
Seyed Naser Hashemipour ◽  
Jamshid Aghaei ◽  
Abdullah Kavousi-fard ◽  
Taher Niknam ◽  
Ladan Salimi ◽  
...  

2018 ◽  
Vol 8 (11) ◽  
pp. 2216
Author(s):  
Jiahui Jin ◽  
Qi An ◽  
Wei Zhou ◽  
Jiakai Tang ◽  
Runqun Xiong

Network bandwidth is a scarce resource in big data environments, so data locality is a fundamental problem for data-parallel frameworks such as Hadoop and Spark. This problem is exacerbated in multicore server-based clusters, where multiple tasks running on the same server compete for the server’s network bandwidth. Existing approaches solve this problem by scheduling computational tasks near the input data and considering the server’s free time, data placements, and data transfer costs. However, such approaches usually set identical values for data transfer costs, even though a multicore server’s data transfer cost increases with the number of data-remote tasks. Eventually, this hampers data-processing time, by minimizing it ineffectively. As a solution, we propose DynDL (Dynamic Data Locality), a novel data-locality-aware task-scheduling model that handles dynamic data transfer costs for multicore servers. DynDL offers greater flexibility than existing approaches by using a set of non-decreasing functions to evaluate dynamic data transfer costs. We also propose online and offline algorithms (based on DynDL) that minimize data-processing time and adaptively adjust data locality. Although DynDL is NP-complete (nondeterministic polynomial-complete), we prove that the offline algorithm runs in quadratic time and generates optimal results for DynDL’s specific uses. Using a series of simulations and real-world executions, we show that our algorithms are 30% better than algorithms that do not consider dynamic data transfer costs in terms of data-processing time. Moreover, they can adaptively adjust data localities based on the server’s free time, data placement, and network bandwidth, and schedule tens of thousands of tasks within subseconds or seconds.


2018 ◽  
Author(s):  
Felipe A. Louza ◽  
Guilherme P. Telles ◽  
Simon Gog

Strings are prevalent in Computer Science and algorithms for their efficient processing are fundamental in various applications. The results introduced in this work contribute with theoretical improvements and practical advances in building full-text indexes. Our first contribution is an in-place algorithm that computes the Burrows-Wheeler transform and the longest common prefix (LCP) array. Our second contribution is the construction of the suffix array augmented with the LCP array in optimal time and space for strings from constant size alphabets. Our third contribution is a set of algorithms to construct full-text indexes for string collections in optimal theoretical bounds. This work is an extended abstract of the Ph.D. thesis of the first author.


2021 ◽  
Vol 102 ◽  
pp. 04013
Author(s):  
Md. Atiqur Rahman ◽  
Mohamed Hamada

Modern daily life activities produced lots of information for the advancement of telecommunication. It is a challenging issue to store them on a digital device or transmit it over the Internet, leading to the necessity for data compression. Thus, research on data compression to solve the issue has become a topic of great interest to researchers. Moreover, the size of compressed data is generally smaller than its original. As a result, data compression saves storage and increases transmission speed. In this article, we propose a text compression technique using GPT-2 language model and Huffman coding. In this proposed method, Burrows-Wheeler transform and a list of keys are used to reduce the original text file’s length. Finally, we apply GPT-2 language mode and then Huffman coding for encoding. This proposed method is compared with the state-of-the-art techniques used for text compression. Finally, we show that the proposed method demonstrates a gain in compression ratio compared to the other state-of-the-art methods.


Author(s):  
Khaled Dehdouh

In the big data warehouses context, a column-oriented NoSQL database system is considered as the storage model which is highly adapted to data warehouses and online analysis. Indeed, the use of NoSQL models allows data scalability easily and the columnar store is suitable for storing and managing massive data, especially for decisional queries. However, the column-oriented NoSQL DBMS do not offer online analysis operators (OLAP). To build OLAP cubes corresponding to the analysis contexts, the most common way is to integrate other software such as HIVE or Kylin which has a CUBE operator to build data cubes. By using that, the cube is built according to the row-oriented approach and does not allow to fully obtain the benefits of a column-oriented approach. In this chapter, the main contribution is to define a cube operator called MC-CUBE (MapReduce Columnar CUBE), which allows building columnar NoSQL cubes according to the columnar approach by taking into account the non-relational and distributed aspects when data warehouses are stored.


Author(s):  
Amine Rahmani

The phenomenon of big data (massive data mining) refers to the exponential growth of the volume of data available on the web. This new concept has become widely used in recent years, enabling scalable, efficient, and fast access to data anytime, anywhere, helping the scientific community and companies identify the most subtle behaviors of users. However, big data has its share of the limits of ethical issues and risks that cannot be ignored. Indeed, new risks in terms of privacy are just beginning to be perceived. Sometimes simply annoying, these risks can be really harmful. In the medium term, the issue of privacy could become one of the biggest obstacles to the growth of big data solutions. It is in this context that a great deal of research is under way to enhance security and develop mechanisms for the protection of privacy of users. Although this area is still in its infancy, the list of possibilities continues to grow.


Author(s):  
Amine Rahmani

The phenomenon of big data (massive data mining) refers to the exponential growth of the volume of data available on the web. This new concept has become widely used in recent years, enabling scalable, efficient, and fast access to data anytime, anywhere, helping the scientific community and companies identify the most subtle behaviors of users. However, big data has its share of the limits of ethical issues and risks that cannot be ignored. Indeed, new risks in terms of privacy are just beginning to be perceived. Sometimes simply annoying, these risks can be really harmful. In the medium term, the issue of privacy could become one of the biggest obstacles to the growth of big data solutions. It is in this context that a great deal of research is under way to enhance security and develop mechanisms for the protection of privacy of users. Although this area is still in its infancy, the list of possibilities continues to grow.


Sign in / Sign up

Export Citation Format

Share Document