Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments

PeerJ ◽

10.7717/peerj.3486 ◽

2017 ◽

Vol 5 ◽

pp. e3486 ◽

Cited By ~ 3

Author(s):

Won Cheol Yim ◽

John C. Cushman

Keyword(s):

Sequence Analysis ◽

Sequence Similarity ◽

Query Sequence ◽

Divide And Conquer ◽

Local Alignment ◽

Data Sets ◽

Processing Unit ◽

Central Processing ◽

Analysis Tools ◽

Similarity Searches

Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible and used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. This freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.

Download Full-text

Using Local Alignments for Relation Recognition

Journal of Artificial Intelligence Research ◽

10.1613/jair.2964 ◽

2010 ◽

Vol 38 ◽

pp. 1-48 ◽

Cited By ~ 2

Author(s):

S. Katrenko ◽

P. W. Adriaans ◽

M. Van Someren

Keyword(s):

Sequence Similarity ◽

General Relation ◽

Similarity Measures ◽

Semantic Relatedness ◽

Structural Similarity ◽

Learning Task ◽

Semantic Knowledge ◽

Local Alignment ◽

Data Sets ◽

Definition Of

This paper discusses the problem of marrying structural similarity with semantic relatedness for Information Extraction from text. Aiming at accurate recognition of relations, we introduce local alignment kernels and explore various possibilities of using them for this task. We give a definition of a local alignment (LA) kernel based on the Smith-Waterman score as a sequence similarity measure and proceed with a range of possibilities for computing similarity between elements of sequences. We show how distributional similarity measures obtained from unlabeled data can be incorporated into the learning task as semantic knowledge. Our experiments suggest that the LA kernel yields promising results on various biomedical corpora outperforming two baselines by a large margin. Additional series of experiments have been conducted on the data sets of seven general relation types, where the performance of the LA kernel is comparable to the current state-of-the-art results.

Download Full-text

Detecting High Scoring Local Alignments in Pangenome Graphs

Bioinformatics ◽

10.1093/bioinformatics/btab077 ◽

2021 ◽

Author(s):

Tizian Schulz ◽

Roland Wittler ◽

Sven Rahmann ◽

Faraz Hach ◽

Jens Stoye

Keyword(s):

Sequence Similarity ◽

Query Sequence ◽

Heuristic Method ◽

Supplementary Information ◽

De Bruijn Graph ◽

Local Alignment ◽

Memory Usage ◽

Sequence Comparisons ◽

De Bruijn Graphs ◽

De Bruijn

Abstract Motivation Increasing amounts of individual genomes sequenced per species motivate the usage of pangenomic approaches. Pangenomes may be represented as graphical structures, e.g. compacted colored de Bruijn graphs, which offer a low memory usage and facilitate reference-free sequence comparisons. While sequence-to-graph mapping to graphical pangenomes has been studied for some time, no local alignment search tool in the vein of BLAST has been proposed yet. Results We present a new heuristic method to find maximum scoring local alignments of a DNA query sequence to a pangenome represented as a compacted colored de Bruijn graph. Our approach additionally allows a comparison of similarity among sequences within the pangenome. We show that local alignment scores follow an exponential-tail distribution similar to BLAST scores, and we discuss how to estimate its parameters to separate local alignments representing sequence homology from spurious findings. An implementation of our method is presented, and its performance and usability are shown. Our approach scales sublinearly in running time and memory usage with respect to the number of genomes under consideration. This is an advantage over classical methods that do not make use of sequence similarity within the pangenome. Availability Source code and test data are available from https://gitlab.ub.uni-bielefeld.de/gi/plast. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ANumerical Uncertainty in Parallel Processing Using Computational Fluid Dynamics as Example

Athens Journal of Τechnology & Engineering ◽

10.30958/ajte.8-2-3 ◽

2021 ◽

Vol 8 (2) ◽

pp. 169-180

Author(s):

Mark Lin ◽

Periklis Papadopoulos

Keyword(s):

Fluid Dynamics ◽

Computational Fluid Dynamics ◽

Load Balancing ◽

Message Passing ◽

Message Passing Interface ◽

Mean Value ◽

Data Sets ◽

Processing Unit ◽

Central Processing ◽

Single Output

Computational methods such as Computational Fluid Dynamics (CFD) traditionally yield a single output – a single number that is much like the result one would get if one were to perform a theoretical hand calculation. However, this paper will show that computation methods have inherent uncertainty which can also be reported statistically. In numerical computation, because many factors affect the data collected, the data can be quoted in terms of standard deviations (error bars) along with a mean value to make data comparison meaningful. In cases where two data sets are obscured by uncertainty, the two data sets are said to be indistinguishable. A sample CFD problem pertaining to external aerodynamics is copied and ran on 29 identical computers in a university computer lab. The expectation is that all 29 runs should return exactly the same result; unfortunately, in a few cases the result turns out to be different. This is attributed to the parallelization scheme which partitions the mesh to run in parallel on multiple cores of the computer. The distribution of the computational load is hardware-driven depending on the available resource of each computer at the time. Things, such as load-balancing among multiple Central Processing Unit (CPU) cores using Message Passing Interface (MPI) are transparent to the user. Software algorithm such as METIS or JOSTLE is used to automatically divide up the load between different processors. As such, the user has no control over the outcome of the CFD calculation even when the same problem is computed. Because of this, numerical uncertainty arises from parallel (multicore) computing. One way to resolve this issue is to compute problems using a single core, without mesh repartitioning. However, as this paper demonstrates even this is not straight forward. Keywords: numerical uncertainty, parallelization, load-balancing, automotive aerodynamics

Download Full-text

VECTOR SPACE INDEXING FOR BIOSEQUENCE SIMILARITY SEARCHES

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213005002405 ◽

2005 ◽

Vol 14 (05) ◽

pp. 811-826 ◽

Cited By ~ 1

Author(s):

OZGUR OZTURK ◽

HAKAN FERHATOSMANOGLU

Keyword(s):

Nearest Neighbor ◽

Sequence Similarity ◽

Distance Functions ◽

Data Sets ◽

Index Structures ◽

K Nearest Neighbor ◽

Protein Databases ◽

Approximation Quality ◽

Similarity Searches ◽

Better Than

We present a multi-dimensional indexing approach for fast sequence similarity search in DNA and protein databases. In particular, we propose effective transformations of subsequences into numerical vector domains and build efficient index structures on the transformed vectors. We then define distance functions in the transformed domain and examine properties of these functions. We experimentally compared their (a) approximation quality for k-Nearest Neighbor (k-NN) queries and both (b) pruning ability and (c) approximation quality for ε-range queries. Results for k-NN queries, which we present here, show that our proposed distances FD2 and WD2 (i.e. Frequency and Wavelet Distance functions for 2-grams) perform significantly better than the others. We then develop effective index structures, based on R-trees and scalar quantization, on top of transformed vectors and distance functions. Promising results from the experiments on real biosequence data sets are presented.

Download Full-text

Detecting High Scoring Local Alignments in Pangenome Graphs

10.1101/2020.09.03.280958 ◽

2020 ◽

Author(s):

Tizian Schulz ◽

Roland Wittler ◽

Sven Rahmann ◽

Faraz Hach ◽

Jens Stoye

Keyword(s):

Sequence Similarity ◽

Query Sequence ◽

Heuristic Method ◽

De Bruijn Graph ◽

Local Alignment ◽

Memory Usage ◽

Sequence Comparisons ◽

De Bruijn Graphs ◽

De Bruijn ◽

Colored De Bruijn Graph

AbstractMotivationIncreasing amounts of individual genomes sequenced per species motivate the usage of pangenomic approaches. Pangenomes may be represented as graphical structures, e.g. compacted colored de Bruijn graphs, which offer a low memory usage and facilitate reference-free sequence comparisons. While sequence-to-graph mapping to graphical pangenomes has been studied for some time, no local alignment search tool in the vein of BLAST has been proposed yet.ResultsWe present a new heuristic method to find maximum scoring local alignments of a DNA query sequence to a pangenome represented as a compacted colored de Bruijn graph. Our approach additionally allows a comparison of similarity among sequences within the pangenome. We show that local alignment scores follow an exponential-tail distribution similar to BLAST scores, and we discuss how to estimate its parameters to separate local alignments representing sequence homology from spurious findings. An implementation of our method is presented, and its performance and usability are shown. Our approach scales sublinearly in running time and memory usage with respect to the number of genomes under consideration. This is an advantage over classical methods that do not make use of sequence similarity within the pangenome.

Download Full-text

The safety of tools in manual disinfection – security of medical personel of the central processing unit and operating theatre

Forum Zakażeń ◽

10.15374/fz2014031 ◽

2014 ◽

Vol 5 (3) ◽

pp. 177-183

Author(s):

Elżbieta Kutrowska

Keyword(s):

Central Processing Unit ◽

Operating Theatre ◽

Processing Unit ◽

Central Processing

Download Full-text

Perangkat keras komputer

10.31219/osf.io/27n4w ◽

2020 ◽

Author(s):

Roudati jannah

Keyword(s):

Data Storage ◽

Central Processing Unit ◽

External Memory ◽

Storage Device ◽

Input Device ◽

Processing Unit ◽

Central Processing ◽

Output Device

Perangkat keras komputer adalah bagian dari sistem komputer sebagai perangkat yang dapat diraba, dilihat secara fisik, dan bertindak untuk menjalankan instruksi dari perangkat lunak (software). Perangkat keras komputer juga disebut dengan hardware. Hardware berperan secara menyeluruh terhadap kinerja suatu sistem komputer. Prinsipnya sistem komputer selalu memiliki perangkat keras masukan (input/input device system) – perangkat keras premprosesan (processing/central processing unit) – perangkat keras luaran (output/output device system) – perangkat tambahan yang sifatnya opsional (peripheral) dan tempat penyimpanan data (storage device system/external memory).

Download Full-text

CENTRAL PROCESSING UNIT (CPU)

10.31219/osf.io/bxnjf ◽

2020 ◽

Author(s):

Ika Milia wahyunu Siregar

Keyword(s):

Processing System ◽

Central Processing Unit ◽

Processing Unit ◽

Central Processing

Perkembangan IT di dunia sangat pesat, mulai dari perkembangan sofware hingga hardware. Teknologi sekarang telah mendominasi sebagian besar di permukaan bumi ini. Karena semakin cepatnya perkembangan Teknologi, kita sebagai pengguna bisa ketinggalan informasi mengenai teknologi baru apabila kita tidak up to date dalam pengetahuan teknologi ini. Hal itu dapat membuat kita mudah tergiur dan tertipu dengan berbagai iklan teknologi tanpa memikirkan sisi negatifnya. Sebagai pengguna dari komputer, kita sebaiknya tahu seputar mengenai komponen-komponen komputer. Komputer adalah serangkaian mesin elektronik yang terdiri dari jutaan komponen yang dapat saling bekerja sama, serta membentuk sebuah sistem kerja yang rapi dan teliti. Sistem ini kemudian digunakan untuk dapat melaksanakan pekerjaan secara otomatis, berdasarkan instruksi (program) yang diberikan kepadanya. Istilah Hardware komputer atau perangkat keras komputer, merupakan benda yang secara fisik dapat dipegang, dipindahkan dan dilihat. Central Processing System/ Central Processing Unit (CPU) adalah salah satu jenis perangkat keras yang berfungsi sebagai tempat untuk pengolahan data atau juga dapat dikatakan sebagai otak dari segala aktivitas pengolahan seperti penghitungan, pengurutan, pencarian, penulisan, pembacaan dan sebagainya.

Download Full-text

Perangkat lunak komputer

10.31219/osf.io/cwgs4 ◽

2020 ◽

Author(s):

Intan khadijah simatupang

Keyword(s):

Data Storage ◽

Central Processing Unit ◽

External Memory ◽

Storage Device ◽

Input Device ◽

Processing Unit ◽

Central Processing ◽

Output Device

Komputer adalah serangkaian mesin elektronik yang terdiri dari jutaan komponen yang dapat saling bekerja sama, serta membentuk sebuah sistem kerja yang rapi dan teliti. Sistem ini kemudian digunakan untuk dapat melaksanakan pekerjaan secara otomatis, berdasarkan instruksi (program) yang diberikan kepadanya. Istilah Hardware computer atau perangkat keras komputer, merupakan benda yang secara fisik dapat dipegang, dipindahkan dan dilihat. Software komputer atau perangkat lunak komputer merupakan kumpulan instruksi (program/prosedur) untuk dapat melaksanakan pekerjaan secara otomatis dengan cara mengolah atau memproses kumpulan instruksi (data) yang diberikan. Pada prinsipnya sistem komputer selalu memiliki perangkat keras masukan (input/input device system) – perangkat keras pemprosesan (processing/ central processing unit) – perangkat keras keluaran (output/output device system), perangkat tambahan yang sifatnya opsional (peripheral) dan tempat penyimpanan data (Storage device system/external memory).

Download Full-text

PERANGKAT KERAS ( HARDWARE ) PADA KOMPUTER

10.31219/osf.io/t3d6p ◽

2020 ◽

Author(s):

Siti Kumala Dewi

Keyword(s):

Processing System ◽

Central Processing Unit ◽

Input Device ◽

Processing Unit ◽

Central Processing ◽

Output Device ◽

System P ◽

System 2

Perangkat keras komputer adalah bagian dari sistem komputer sebagai perangkat yang dapat diraba, dilihat secara fisik, dan bertindak untuk menjalankan instruksi dari perangkat lunak (software). Perangkat keras komputer juga disebut dengan hardware. Hardware berperan secara menyeluruh terhadap kinerja suatu sistem komputer. Berdasarkan fungsinya, perangkat keras terbagi menjadi :1.Sistem Perangkat Keras Masukan (Input Device System )2.Sistem Pemrosesan ( Central Processing System/ Central Processing Unit(CPU)3.Sistem Perangkat Keras Keluaran ( Output Device System )4.Sistem Perangkat Keras Tambahan (Peripheral/Accessories Device System)

Download Full-text