Parallel implementation of large-scale CFD data compression toward aeroacoustic analysis

Block-Split Array Coding Algorithm for Long-Stream Data Compression

Journal of Sensors ◽

10.1155/2020/5726527 ◽

2020 ◽

Vol 2020 ◽

pp. 1-22

Author(s):

Qin Jiancheng ◽

Lu Yiqin ◽

Zhong Yu

Keyword(s):

Big Data ◽

Data Compression ◽

Compression Ratio ◽

Large Scale ◽

Industrial Revolution ◽

Parallel Implementation ◽

Small Data ◽

Stream Data ◽

Network Bandwidth ◽

General Data

With the advent of IR (Industrial Revolution) 4.0, the spread of sensors in IoT (Internet of Things) may generate massive data, which will challenge the limited sensor storage and network bandwidth. Hence, the study of big data compression is valuable in the field of sensors. A problem is how to compress the long-stream data efficiently with the finite memory of a sensor. To maintain the performance, traditional techniques of compression have to treat the data streams on a small and incompetent scale, which will reduce the compression ratio. To solve this problem, this paper proposes a block-split coding algorithm named “CZ-Array algorithm,” and implements it in the shareware named “ComZip.” CZ-Array can use a relatively small data window to cover a configurable large scale, which benefits the compression ratio. It is fast with the time complexity O(N) and fits the big data compression. The experiment results indicate that ComZip with CZ-Array can obtain a better compression ratio than gzip, lz4, bzip2, and p7zip in the multiple stream data compression, and it also has a competent speed among these general data compression software. Besides, CZ-Array is concise and fits the hardware parallel implementation of sensors.

Download Full-text

A Parallel Unmixing-Based Content Retrieval System for Distributed Hyperspectral Imagery Repository on Cloud Computing Platforms

Remote Sensing ◽

10.3390/rs13020176 ◽

2021 ◽

Vol 13 (2) ◽

pp. 176

Author(s):

Peng Zheng ◽

Zebin Wu ◽

Jin Sun ◽

Yi Zhang ◽

Yaoqin Zhu ◽

...

Keyword(s):

Cloud Computing ◽

Large Scale ◽

Retrieval System ◽

Hyperspectral Image ◽

Parallel Implementation ◽

Remotely Sensed Data ◽

Web Interfaces ◽

Content Retrieval ◽

Service Mode ◽

Computing Platforms

As the volume of remotely sensed data grows significantly, content-based image retrieval (CBIR) becomes increasingly important, especially for cloud computing platforms that facilitate processing and storing big data in a parallel and distributed way. This paper proposes a novel parallel CBIR system for hyperspectral image (HSI) repository on cloud computing platforms under the guide of unmixed spectral information, i.e., endmembers and their associated fractional abundances, to retrieve hyperspectral scenes. However, existing unmixing methods would suffer extremely high computational burden when extracting meta-data from large-scale HSI data. To address this limitation, we implement a distributed and parallel unmixing method that operates on cloud computing platforms in parallel for accelerating the unmixing processing flow. In addition, we implement a global standard distributed HSI repository equipped with a large spectral library in a software-as-a-service mode, providing users with HSI storage, management, and retrieval services through web interfaces. Furthermore, the parallel implementation of unmixing processing is incorporated into the CBIR system to establish the parallel unmixing-based content retrieval system. The performance of our proposed parallel CBIR system was verified in terms of both unmixing efficiency and accuracy.

Download Full-text

Parallel Framework for Dimensionality Reduction of Large-Scale Datasets

Scientific Programming ◽

10.1155/2015/180214 ◽

2015 ◽

Vol 2015 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Sai Kiranmayee Samudrala ◽

Jaroslaw Zola ◽

Srinivas Aluru ◽

Baskar Ganapathysubramanian

Keyword(s):

Dimensionality Reduction ◽

Organic Solar Cells ◽

Large Scale ◽

Parallel Implementation ◽

High Dimensional Data ◽

Real Life ◽

Processing Parameters ◽

High Dimensional ◽

Morphology Evolution ◽

Reduction Techniques

Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.

Download Full-text

A heterogeneous parallel implementation of the Markov clustering algorithm for large-scale biological networks on distributed CPU–GPU clusters

The Journal of Supercomputing ◽

10.1007/s11227-021-04204-6 ◽

2022 ◽

Author(s):

You Fu ◽

Wei Zhou

Keyword(s):

Biological Networks ◽

Large Scale ◽

Clustering Algorithm ◽

Parallel Implementation ◽

Gpu Clusters ◽

Markov Clustering

Download Full-text

The large-scale blast score ratio (LS-BSR) pipeline: a method to rapidly compare genetic content between bacterial genomes

10.7287/peerj.preprints.220v1 ◽

2014 ◽

Author(s):

Jason W Sahl ◽

Greg Caporaso ◽

David A Rasko ◽

Paul S Keim

Keyword(s):

Large Scale ◽

Sequence Data ◽

Parallel Implementation ◽

Genetic Relationships ◽

Clinical Diagnostics ◽

Whole Genome Sequence ◽

Bacterial Isolates ◽

Bacterial Genomes ◽

E Coli ◽

Blast Score

Background. As whole genome sequence data from bacterial isolates becomes cheaper to generate, computational methods are needed to correlate sequence data with biological observations. Here we present the large-scale BLAST score ratio (LS-BSR) pipeline, which rapidly compares the genetic content of hundreds to thousands of bacterial genomes, and returns a matrix that describes the relatedness of all coding sequences (CDSs) in all genomes surveyed. This matrix can be easily parsed in order to identify genetic relationships between bacterial genomes. Although pipelines have been published that group peptides by sequence similarity, no other software performs the large-scale, flexible, full-genome comparative analyses carried out by LS-BSR. Results. To demonstrate the utility of the method, the LS-BSR pipeline was tested on 96 Escherichia coli and Shigella genomes; the pipeline ran in 163 minutes using 16 processors, which is a greater than 7-fold speedup compared to using a single processor. The BSR values for each CDS, which indicate a relative level of relatedness, were then mapped to each genome on an independent core genome single nucleotide polymorphism (SNP) based phylogeny. Comparisons were then used to identify clade specific CDS markers and validate the LS-BSR pipeline based on molecular markers that delineate between classical E. coli pathogenic variant (pathovar) designations. Scalability tests demonstrated that the LS-BSR pipeline can process 1,000 E. coli genomes in ~60h using 16 processors. Conclusions. LS-BSR is an open-source, parallel implementation of the BSR algorithm, enabling rapid comparison of the genetic content of large numbers of genomes. The results of the pipeline can be used to identify specific markers between user-defined phylogenetic groups, and to identify the loss and/or acquisition of genetic information between bacterial isolates. Taxa-specific genetic markers can then be translated into clinical diagnostics, or can be used to identify broadly conserved putative therapeutic candidates.

Download Full-text

Hybrid MPI and OpenMP parallel implementation of large-scale linear-response time-dependent density functional theory with plane-wave basis set

Electronic Structure ◽

10.1088/2516-1075/abfd1f ◽

2021 ◽

Author(s):

Lingyun Wan ◽

Xiaofeng Liu ◽

Jie Liu ◽

Xinming Qin ◽

Wei Hu ◽

...

Keyword(s):

Density Functional Theory ◽

Plane Wave ◽

Linear Response ◽

Density Functional ◽

Large Scale ◽

Parallel Implementation ◽

Time Dependent ◽

Basis Set ◽

Functional Theory ◽

Dependent Density

Download Full-text

An Efficient Large-Scale Volume Data Compression Algorithm

Advances in Neural Networks – ISNN 2009 - Lecture Notes in Computer Science ◽

10.1007/978-3-642-01513-7_62 ◽

2009 ◽

pp. 567-575

Author(s):

Degui Xiao ◽

Liping Zhao ◽

Lei Yang ◽

Zhiyong Li ◽

Kenli Li

Keyword(s):

Data Compression ◽

Large Scale ◽

Compression Algorithm ◽

Volume Data

Download Full-text

Parallel Implementation of LQG Balanced Truncation for Large-Scale Systems

Large-Scale Scientific Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-540-78827-0_24 ◽

2008 ◽

pp. 227-234 ◽

Cited By ~ 1

Author(s):

Jose M. Badía ◽

Peter Benner ◽

Rafael Mayo ◽

Enrique S. Quintana-Ortí ◽

Gregorio Quintana-Ortí ◽

...

Keyword(s):

Large Scale ◽

Parallel Implementation ◽

Balanced Truncation ◽

Large Scale Systems

Download Full-text

Assembling of Parallel Programs for Large Scale Numerical Modeling

Computer Engineering ◽

10.4018/978-1-61350-456-7.ch301 ◽

2012 ◽

pp. 497-511

Author(s):

V.E. Malyshkin

Keyword(s):

Large Scale ◽

Parallel Implementation ◽

Numerical Models ◽

Dynamic Properties ◽

Parallel Program ◽

Particle In Cell ◽

Modular Programming ◽

Rectangular Mesh ◽

Assembly Technology ◽

Main Ideas

The main ideas of the Assembly Technology (AT) in its application to parallel implementation of large scale realistic numerical models on a rectangular mesh are considered and demonstrated by the parallelization (fragmentation) of the Particle-In-Cell method (PIC) application to solution of the problem of energy exchange in plasma cloud. The implementation of the numerical models with the assembly technology is based on the construction of a fragmented parallel program. Assembling of a numerical simulation program under AT provides automatically different useful dynamic properties of the target program including dynamic load balance on the basis of the fragments migration from overloaded into underloaded processor elements of a multicomputer. Parallel program assembling approach also can be considered as combination and adaptation for parallel programming of the well known modular programming and domain decomposition techniques and supported by the system software for fragmented programs assembling.

Download Full-text

Deflation of Periodic Orbits in Large-Scale Systems: Algorithm and Parallel Implementation

Communications in Computer and Information Science - Parallel Computational Technologies ◽

10.1007/978-3-030-81691-9_6 ◽

2021 ◽

pp. 76-91

Author(s):

N. M. Evstigneev

Keyword(s):

Periodic Orbits ◽

Large Scale ◽

Parallel Implementation ◽

Large Scale Systems

Download Full-text