Parallel computing for genome sequence processing

Author(s):  
You Zou ◽  
Yuejie Zhu ◽  
Yaohang Li ◽  
Fang-Xiang Wu ◽  
Jianxin Wang

Abstract The rapid increase of genome data brought by gene sequencing technologies poses a massive challenge to data processing. To solve the problems caused by enormous data and complex computing requirements, researchers have proposed many methods and tools which can be divided into three types: big data storage, efficient algorithm design and parallel computing. The purpose of this review is to investigate popular parallel programming technologies for genome sequence processing. Three common parallel computing models are introduced according to their hardware architectures, and each of which is classified into two or three types and is further analyzed with their features. Then, the parallel computing for genome sequence processing is discussed with four common applications: genome sequence alignment, single nucleotide polymorphism calling, genome sequence preprocessing, and pattern detection and searching. For each kind of application, its background is firstly introduced, and then a list of tools or algorithms are summarized in the aspects of principle, hardware platform and computing efficiency. The programming model of each hardware and application provides a reference for researchers to choose high-performance computing tools. Finally, we discuss the limitations and future trends of parallel computing technologies.

2019 ◽  
Vol 13 (S1) ◽  
Author(s):  
Rongjie Wang ◽  
Tianyi Zang ◽  
Yadong Wang

Abstract Background In recent years, with the development of high-throughput genome sequencing technologies, a large amount of genome data has been generated, which has caused widespread concern about data storage and transmission costs. However, how to effectively compression genome sequences data remains an unsolved problem. Results In this paper, we propose a compression method using machine learning techniques (DeepDNA), for compressing human mitochondrial genome data. The experimental results show the effectiveness of our proposed method compared with other on the human mitochondrial genome data. Conclusions The compression method we proposed can be classified as non-reference based method, but the compression effect is comparable to that of reference based methods. Moreover, our method not only have a well compression results in the population genome with large redundancy, but also in the single genome with small redundancy. The codes of DeepDNA are available at https://github.com/rongjiewang/DeepDNA.


2012 ◽  
Vol 220-223 ◽  
pp. 2520-2523
Author(s):  
Wang Shen Hao ◽  
Xin Min Dong ◽  
Jie Han ◽  
Wen Ping Lei

Generally working in severe conditions, mechanical equipments are subjected to progressive deterioration of their state. The mechanical failures account for more than 60% of breakdowns of the system. Therefore, the identification of impending mechanical fault is very important to prevent the system from illness running. It generally requires high performance computer to complete the traditional parallel computing, while the parallel FFT algorithm based on Hadoop MapReduce programming model can be realized in the low-end machines. Combining with Cloud Computing and equipment fault diagnosis technology, it can realize the massive data parallel computing and distributed storage. The result of experiment shows that it would provide a good solution and technical support for mechanical equipment on-line monitoring and real-time fault diagnosis.


Author(s):  
Konstantin Volovich ◽  
Sergey Denisov

The paper discusses methods of data storage when performing parallel computations in a multicomputer high-performance computing complex in virtual software environments. Approaches to building a data storage system using software systems designed to solve problems of materials science are proposed.


2012 ◽  
Vol 17 (4) ◽  
pp. 207-216 ◽  
Author(s):  
Magdalena Szymczyk ◽  
Piotr Szymczyk

Abstract The MATLAB is a technical computing language used in a variety of fields, such as control systems, image and signal processing, visualization, financial process simulations in an easy-to-use environment. MATLAB offers "toolboxes" which are specialized libraries for variety scientific domains, and a simplified interface to high-performance libraries (LAPACK, BLAS, FFTW too). Now MATLAB is enriched by the possibility of parallel computing with the Parallel Computing ToolboxTM and MATLAB Distributed Computing ServerTM. In this article we present some of the key features of MATLAB parallel applications focused on using GPU processors for image processing.


Author(s):  
Breno A. de Melo Menezes ◽  
Nina Herrmann ◽  
Herbert Kuchen ◽  
Fernando Buarque de Lima Neto

AbstractParallel implementations of swarm intelligence algorithms such as the ant colony optimization (ACO) have been widely used to shorten the execution time when solving complex optimization problems. When aiming for a GPU environment, developing efficient parallel versions of such algorithms using CUDA can be a difficult and error-prone task even for experienced programmers. To overcome this issue, the parallel programming model of Algorithmic Skeletons simplifies parallel programs by abstracting from low-level features. This is realized by defining common programming patterns (e.g. map, fold and zip) that later on will be converted to efficient parallel code. In this paper, we show how algorithmic skeletons formulated in the domain specific language Musket can cope with the development of a parallel implementation of ACO and how that compares to a low-level implementation. Our experimental results show that Musket suits the development of ACO. Besides making it easier for the programmer to deal with the parallelization aspects, Musket generates high performance code with similar execution times when compared to low-level implementations.


Coatings ◽  
2021 ◽  
Vol 11 (3) ◽  
pp. 318
Author(s):  
Yang Li ◽  
Cheng Zhang ◽  
Zhiming Shi ◽  
Jingni Li ◽  
Qingyun Qian ◽  
...  

The explosive growth of data and information has increasingly motivated scientific and technological endeavors toward ultra-high-density data storage (UHDDS) applications. Herein, a donor−acceptor (D–A) type small conjugated molecule containing benzothiadiazole (BT) is prepared (NIBTCN), which demonstrates multilevel resistive memory behavior and holds considerable promise for implementing the target of UHDDS. The as-prepared device presents distinct current ratios of 105.2/103.2/1, low threshold voltages of −1.90 V and −3.85 V, and satisfactory reproducibility beyond 60%, which suggests reliable device performance. This work represents a favorable step toward further development of highly-efficient D−A molecular systems, which opens more opportunities for achieving high performance multilevel memory materials and devices.


2013 ◽  
Vol 411-414 ◽  
pp. 585-588
Author(s):  
Liu Yang ◽  
Tie Ying Liu

This paper introduces parallel feature of the GPU, which will help GPU parallel computation methods to achieve the parallelization of PSO parallel path search process; and reduce the increasingly high problem of PSO (PSO: Particle Swarm Optimization) in time and space complexity. The experimental results show: comparing with CPU mode, GPU platform calculation improves the search rate and shortens the calculation time.


2010 ◽  
Vol 192 (24) ◽  
pp. 6492-6493 ◽  
Author(s):  
Angel Angelov ◽  
Susanne Liebl ◽  
Meike Ballschmiter ◽  
Mechthild Bömeke ◽  
Rüdiger Lehmann ◽  
...  

ABSTRACT Spirochaeta thermophila is a thermophilic, free-living anaerobe that is able to degrade various α- and β-linked sugar polymers, including cellulose. We report here the complete genome sequence of S. thermophila DSM 6192, which is the first genome sequence of a thermophilic, free-living member of the Spirochaetes phylum. The genome data reveal a high density of genes encoding enzymes from more than 30 glycoside hydrolase families, a noncellulosomal enzyme system for (hemi)cellulose degradation, and indicate the presence of a novel carbohydrate-binding module.


Sign in / Sign up

Export Citation Format

Share Document