Improving I/O Performance with Adaptive Data Compression for Big Data Applications

Author(s):  
Hongbo Zou ◽  
Yongen Yu ◽  
Wei Tang ◽  
Hsuanwei Michelle Chen
Algorithms ◽  
2020 ◽  
Vol 13 (7) ◽  
pp. 159 ◽  
Author(s):  
Shinichi Yamagiwa ◽  
Eisaku Hayakawa ◽  
Koichi Marumo

Toward strong demand for very high-speed I/O for processors, physical performance growth of hardware I/O speed was drastically increased in this decade. However, the recent Big Data applications still demand the larger I/O bandwidth and the lower latency for the speed. Because the current I/O performance does not improve so drastically, it is the time to consider another way to increase it. To overcome this challenge, we focus on lossless data compression technology to decrease the amount of data itself in the data communication path. The recent Big Data applications treat data stream that flows continuously and never allow stalling processing due to the high speed. Therefore, an elegant hardware-based data compression technology is demanded. This paper proposes a novel lossless data compression, called ASE coding. It encodes streaming data by applying the entropy coding approach. ASE coding instantly assigns the fewest bits to the corresponding compressed data according to the number of occupied entries in a look-up table. This paper describes the detailed mechanism of ASE coding. Furthermore, the paper demonstrates performance evaluations to promise that ASE coding adaptively shrinks streaming data and also works on a small amount of hardware resources without stalling or buffering any part of data stream.


Author(s):  
Yu Zhang ◽  
Yan-Ge Wang ◽  
Yan-Ping Bai ◽  
Yong-Zhen Li ◽  
Zhao-Yong Lv ◽  
...  

Author(s):  
Jonatan Enes ◽  
Guillaume Fieni ◽  
Roberto R. Exposito ◽  
Romain Rouvoy ◽  
Juan Tourino

2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Mahdi Torabzadehkashi ◽  
Siavash Rezaei ◽  
Ali HeydariGorji ◽  
Hosein Bobarshad ◽  
Vladimir Alves ◽  
...  

AbstractIn the era of big data applications, the demand for more sophisticated data centers and high-performance data processing mechanisms is increasing drastically. Data are originally stored in storage systems. To process data, application servers need to fetch them from storage devices, which imposes the cost of moving data to the system. This cost has a direct relation with the distance of processing engines from the data. This is the key motivation for the emergence of distributed processing platforms such as Hadoop, which move process closer to data. Computational storage devices (CSDs) push the “move process to data” paradigm to its ultimate boundaries by deploying embedded processing engines inside storage devices to process data. In this paper, we introduce Catalina, an efficient and flexible computational storage platform, that provides a seamless environment to process data in-place. Catalina is the first CSD equipped with a dedicated application processor running a full-fledged operating system that provides filesystem-level data access for the applications. Thus, a vast spectrum of applications can be ported for running on Catalina CSDs. Due to these unique features, to the best of our knowledge, Catalina CSD is the only in-storage processing platform that can be seamlessly deployed in clusters to run distributed applications such as Hadoop MapReduce and HPC applications in-place without any modifications on the underlying distributed processing framework. For the proof of concept, we build a fully functional Catalina prototype and a CSD-equipped platform using 16 Catalina CSDs to run Intel HiBench Hadoop and HPC benchmarks to investigate the benefits of deploying Catalina CSDs in the distributed processing environments. The experimental results show up to 2.2× improvement in performance and 4.3× reduction in energy consumption, respectively, for running Hadoop MapReduce benchmarks. Additionally, thanks to the Neon SIMD engines, the performance and energy efficiency of DFT algorithms are improved up to 5.4× and 8.9×, respectively.


Author(s):  
Seyed Naser Hashemipour ◽  
Jamshid Aghaei ◽  
Abdullah Kavousi-fard ◽  
Taher Niknam ◽  
Ladan Salimi ◽  
...  

Author(s):  
Bernard Tuffour Atuahene ◽  
Sittimont Kanjanabootra ◽  
Thayaparan Gajendran

Big data applications consist of i) data collection using big data sources, ii) storing and processing the data, and iii) analysing data to gain insights for creating organisational benefit. The influx of digital technologies and digitization in the construction process includes big data as one newly emerging digital technology adopted in the construction industry. Big data application is in a nascent stage in construction, and there is a need to understand the tangible benefit(s) that big data can offer the construction industry. This study explores the benefits of big data in the construction industry. Using a qualitative case study design, construction professionals in an Australian Construction firm were interviewed. The research highlights that the benefits of big data include reduction of litigation amongst projects stakeholders, enablement of near to real-time communication, and facilitation of effective subcontractor selection. By implication, on a broader scale, these benefits can improve contract management, procurement, and management of construction projects. This study contributes to an ongoing discourse on big data application, and more generally, digitization in the construction industry.


Sign in / Sign up

Export Citation Format

Share Document