Improving I/O Performance with Adaptive Data Compression for Big Data Applications

Stream-Based Lossless Data Compression Applying Adaptive Entropy Coding for Hardware-Based Implementation

Algorithms ◽

10.3390/a13070159 ◽

2020 ◽

Vol 13 (7) ◽

pp. 159 ◽

Cited By ~ 1

Author(s):

Shinichi Yamagiwa ◽

Eisaku Hayakawa ◽

Koichi Marumo

Keyword(s):

Big Data ◽

Data Compression ◽

Data Stream ◽

High Speed ◽

Data Communication ◽

Streaming Data ◽

Entropy Coding ◽

Big Data Applications ◽

Lossless Data Compression ◽

Compressed Data

Toward strong demand for very high-speed I/O for processors, physical performance growth of hardware I/O speed was drastically increased in this decade. However, the recent Big Data applications still demand the larger I/O bandwidth and the lower latency for the speed. Because the current I/O performance does not improve so drastically, it is the time to consider another way to increase it. To overcome this challenge, we focus on lossless data compression technology to decrease the amount of data itself in the data communication path. The recent Big Data applications treat data stream that flows continuously and never allow stalling processing due to the high speed. Therefore, an elegant hardware-based data compression technology is demanded. This paper proposes a novel lossless data compression, called ASE coding. It encodes streaming data by applying the entropy coding approach. ASE coding instantly assigns the fewest bits to the corresponding compressed data according to the number of occupied entries in a look-up table. This paper describes the detailed mechanism of ASE coding. Furthermore, the paper demonstrates performance evaluations to promise that ASE coding adaptively shrinks streaming data and also works on a small amount of hardware resources without stalling or buffering any part of data stream.

Download Full-text

Guest Editorial Special Issue on Big Data Applications and Techniques in Cyber Threat Intelligence

Intelligent Automation & Soft Computing ◽

10.31209/2020.100000198 ◽

2020 ◽

pp. -1--1

Author(s):

Zheng Xu ◽

Qingyuan Zhou

Keyword(s):

Big Data ◽

Guest Editorial ◽

Special Issue ◽

Big Data Applications ◽

Threat Intelligence ◽

Editorial Special Issue ◽

Cyber Threat ◽

Cyber Threat Intelligence

Download Full-text

A New Rockburst Experiment Data Compression Storage Algorithm based on Big Data Technology

Intelligent Automation & Soft Computing ◽

10.31209/2019.100000111 ◽

2019 ◽

pp. 561-573

Author(s):

Yu Zhang ◽

Yan-Ge Wang ◽

Yan-Ping Bai ◽

Yong-Zhen Li ◽

Zhao-Yong Lv ◽

...

Keyword(s):

Big Data ◽

Data Compression ◽

Experiment Data ◽

Big Data Technology

Download Full-text

Poster: Cascaded TCP: BIG Throughput for BIG DATA Applications in Distributed HPC

2012 SC Companion: High Performance Computing, Networking Storage and Analysis ◽

10.1109/sc.companion.2012.230 ◽

2012 ◽

Author(s):

Umar Kalim ◽

Mark Gardner ◽

Eric Brown ◽

Wu-chun Feng

Keyword(s):

Big Data ◽

Big Data Applications

Download Full-text

Power Budgeting of Big Data Applications in Container-based Clusters

2020 IEEE International Conference on Cluster Computing (CLUSTER) ◽

10.1109/cluster49012.2020.00038 ◽

2020 ◽

Author(s):

Jonatan Enes ◽

Guillaume Fieni ◽

Roberto R. Exposito ◽

Romain Rouvoy ◽

Juan Tourino

Keyword(s):

Big Data ◽

Big Data Applications

Download Full-text

Computational storage: an efficient and scalable platform for big data and HPC applications

Journal Of Big Data ◽

10.1186/s40537-019-0265-5 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 2

Author(s):

Mahdi Torabzadehkashi ◽

Siavash Rezaei ◽

Ali HeydariGorji ◽

Hosein Bobarshad ◽

Vladimir Alves ◽

...

Keyword(s):

Big Data ◽

High Performance ◽

Distributed Processing ◽

Data Access ◽

Distributed Applications ◽

Process Data ◽

Storage Devices ◽

Hadoop Mapreduce ◽

Big Data Applications ◽

Application Processor

AbstractIn the era of big data applications, the demand for more sophisticated data centers and high-performance data processing mechanisms is increasing drastically. Data are originally stored in storage systems. To process data, application servers need to fetch them from storage devices, which imposes the cost of moving data to the system. This cost has a direct relation with the distance of processing engines from the data. This is the key motivation for the emergence of distributed processing platforms such as Hadoop, which move process closer to data. Computational storage devices (CSDs) push the “move process to data” paradigm to its ultimate boundaries by deploying embedded processing engines inside storage devices to process data. In this paper, we introduce Catalina, an efficient and flexible computational storage platform, that provides a seamless environment to process data in-place. Catalina is the first CSD equipped with a dedicated application processor running a full-fledged operating system that provides filesystem-level data access for the applications. Thus, a vast spectrum of applications can be ported for running on Catalina CSDs. Due to these unique features, to the best of our knowledge, Catalina CSD is the only in-storage processing platform that can be seamlessly deployed in clusters to run distributed applications such as Hadoop MapReduce and HPC applications in-place without any modifications on the underlying distributed processing framework. For the proof of concept, we build a fully functional Catalina prototype and a CSD-equipped platform using 16 Catalina CSDs to run Intel HiBench Hadoop and HPC benchmarks to investigate the benefits of deploying Catalina CSDs in the distributed processing environments. The experimental results show up to 2.2× improvement in performance and 4.3× reduction in energy consumption, respectively, for running Hadoop MapReduce benchmarks. Additionally, thanks to the Neon SIMD engines, the performance and energy efficiency of DFT algorithms are improved up to 5.4× and 8.9×, respectively.

Download Full-text

Big Data Compression in Smart Grids via Optimal Singular Value Decomposition

2020 IEEE Industry Applications Society Annual Meeting ◽

10.1109/ias44978.2020.9334900 ◽

2020 ◽

Author(s):

Seyed Naser Hashemipour ◽

Jamshid Aghaei ◽

Abdullah Kavousi-fard ◽

Taher Niknam ◽

Ladan Salimi ◽

...

Keyword(s):

Big Data ◽

Singular Value Decomposition ◽

Data Compression ◽

Smart Grids ◽

Singular Value ◽

Value Decomposition

Download Full-text

Performance analysis model for big data applications in cloud computing

Journal of Cloud Computing Advances Systems and Applications ◽

10.1186/s13677-014-0019-z ◽

2014 ◽

Vol 3 (1) ◽

Cited By ~ 8

Author(s):

Luis Eduardo Bautista Villalpando ◽

Alain April ◽

Alain Abran

Keyword(s):

Cloud Computing ◽

Big Data ◽

Performance Analysis ◽

Analysis Model ◽

Big Data Applications

Download Full-text

Preliminary Benefits of Big Data in the Construction Industry: A Case Study

Proceedings of the Institution of Civil Engineers - Management Procurement and Law ◽

10.1680/jmapl.21.00027 ◽

2022 ◽

pp. 1-11

Author(s):

Bernard Tuffour Atuahene ◽

Sittimont Kanjanabootra ◽

Thayaparan Gajendran

Keyword(s):

Big Data ◽

Construction Industry ◽

Construction Projects ◽

Big Data Applications ◽

Data Application ◽

Construction Firm ◽

Big Data Application ◽

Tangible Benefit ◽

Design Construction

Big data applications consist of i) data collection using big data sources, ii) storing and processing the data, and iii) analysing data to gain insights for creating organisational benefit. The influx of digital technologies and digitization in the construction process includes big data as one newly emerging digital technology adopted in the construction industry. Big data application is in a nascent stage in construction, and there is a need to understand the tangible benefit(s) that big data can offer the construction industry. This study explores the benefits of big data in the construction industry. Using a qualitative case study design, construction professionals in an Australian Construction firm were interviewed. The research highlights that the benefits of big data include reduction of litigation amongst projects stakeholders, enablement of near to real-time communication, and facilitation of effective subcontractor selection. By implication, on a broader scale, these benefits can improve contract management, procurement, and management of construction projects. This study contributes to an ongoing discourse on big data application, and more generally, digitization in the construction industry.

Download Full-text

SciDP: Support HPC and Big Data Applications via Integrated Scientific Data Processing

2018 IEEE International Conference on Cluster Computing (CLUSTER) ◽

10.1109/cluster.2018.00023 ◽

2018 ◽

Cited By ~ 4

Author(s):

Kun Feng ◽

Xian-He Sun ◽

Xi Yang ◽

Shujia Zhou

Keyword(s):

Big Data ◽

Data Processing ◽

Scientific Data ◽

Big Data Applications

Download Full-text