Massively Parallel Implementation of Sequence Alignment with Basic Local Alignment Search Tool Using Parallel Computing in Java Library

Marek Nowicki; Davit Bzhalava; Piotr BaŁa

doi:10.1089/cmb.2018.0079

High speed BLASTN: an accelerated MegaBLAST search tool

Nucleic Acids Research ◽

10.1093/nar/gkv784 ◽

2015 ◽

Vol 43 (16) ◽

pp. 7762-7768 ◽

Cited By ~ 67

Author(s):

Ying Chen ◽

Weicai Ye ◽

Yongdong Zhang ◽

Yuesheng Xu

Keyword(s):

Sequence Alignment ◽

High Speed ◽

Lookup Table ◽

Local Alignment ◽

Biological Sequences ◽

Parallel Performance ◽

Seeding Method ◽

Computational Speed ◽

Search Tool ◽

Nucleotide Database

Abstract Sequence alignment is a long standing problem in bioinformatics. The Basic Local Alignment Search Tool (BLAST) is one of the most popular and fundamental alignment tools. The explosive growth of biological sequences calls for speedup of sequence alignment tools such as BLAST. To this end, we develop high speed BLASTN (HS-BLASTN), a parallel and fast nucleotide database search tool that accelerates MegaBLAST—the default module of NCBI-BLASTN. HS-BLASTN builds a new lookup table using the FMD-index of the database and employs an accurate and effective seeding method to find short stretches of identities (called seeds) between the query and the database. HS-BLASTN produces the same alignment results as MegaBLAST and its computational speed is much faster than MegaBLAST. Specifically, our experiments conducted on a 12-core server show that HS-BLASTN can be 22 times faster than MegaBLAST and exhibits better parallel performance than MegaBLAST. HS-BLASTN is written in C++ and the related source code is available at https://github.com/chenying2016/queries under the GPLv3 license.

Download Full-text

Study of Basic Local Alignment Search Tool (BLAST) and Multiple Sequence Alignment (Clustal- X) of Monoclonal mice/human antibodies

10.1101/2021.07.09.451785 ◽

2021 ◽

Author(s):

IVAN VITO FERRARI ◽

Paolo PATRIZIO

Keyword(s):

Amino Acids ◽

Monoclonal Antibodies ◽

Sequence Alignment ◽

Heavy Chain ◽

Light Chain ◽

Light Chains ◽

Local Alignment ◽

Multiple Sequence ◽

Heavy Chains ◽

Search Tool

In this work, we have focused on the study of the Basic Local Alignment Search Tool (BLAST) and Multiple Sequence Alignment (Clustal- X) of different monoclonal mice antibodies to understand better the multiple alignments of sequences. Our strategy was to compare the light chains of multiple monoclonal antibodies to each other, calculating their identity percentage and in which amino acid portion. (See below figure 2) Subsequently, the same survey of heavy chains was carried out with the same methodology. (See below figure 3) Finally, sequence alignment between the light chain of one antibody and the heavy chain of another antibody was studied to understand what happens if chains are exchanged between antibodies. (See below figure 4) From our results of BLAST estimation alignment, we have reported that the Light Chains (Ls) of Monoclonal Antibodies in Comparison have a sequence Homology of about 60-80% and they have a part identical in sequence zone in range 100-210 residues amino acids, except ID PDB 4ISV, which it turns out to have a 40% lower homology than the others antibodies. As far as, the heavy chains (Hs) of Monoclonal Antibodies are concerned, however they tend to have a less homology of sequences, compared to lights chains consideration, equal to 60%-70% and they have an identical part in the sequence zone between 150-210 residues amino acids; with the exception of ID PDB 3I9G-3W9D antibodies that have an equal homology at 50%. ( See supporting part) Summing up: about 70-80% identity among 2 light chains of 2 antibodies, 60-70% identity between 2 heavy chains of 2 antibodies, 30% identity between the two chains of a antibody and 30% if you compare the light chain of one antibody with the heavy chain of another antibody.

Download Full-text

Genetic Diversity of Cryptosporidium Spp. in Njoro Sub County, Nakuru, Kenya

10.21203/rs.3.rs-621237/v1 ◽

2021 ◽

Author(s):

Walter Miding'a Essendi ◽

Charles Inyagwa Muleke ◽

Miheso Manfred ◽

Elick Onyango Otachi

Keyword(s):

Genetic Diversity ◽

Phylogenetic Analysis ◽

Sequence Alignment ◽

Evolutionary Genetics ◽

Domestic Animals ◽

Local Alignment ◽

Cryptosporidium Spp ◽

Potential Source ◽

Search Tool ◽

Great Genetic Diversity

Abstract Cryptosporidium spp. cause Cryptosporidiosis in humans through zoonotic and anthroponotic transmission. Previous studies have illustrated the significance of domestic animals as reservoirs of this parasite. However, there is no information on the Cryptosporidium spp. and genotypes circulating in Njoro Sub County. A total of 2174 samples from humans, cattle, chicken, sheep and goats were assessed for presence of Cryptosporidium spp. Thirty-three positive samples were successfully sequenced. The sequences obtained were compared to Cryptosporidium sequences in the GenBank using NCBI’s (National Center for Biotechnology Information) online BLAST (Basic Local Alignment Search Tool) algorithmic program. Sequence alignment was done using the Clustal W program and phylogenetic analysis was executed in MEGA 6 (Molecular Evolutionary Genetics Analysis version 6.0). The Cryptosporidium spp. present in the watershed showed great genetic diversity with nine (9) Cryptosporidium spp. namely: C. parvum, C. hominis, C. ubiquitum, C. meleagridis, C. andersoni, C. baileyi, C. muris, C. xiaoi and C. viatorum. Cattle were the biggest reservoirs of zoonotic Cryptosporidium spp. hence a potential source of zoonosis in humans while goats had the least species. This is the first study that reported presence of C. viatorum in Kenya.

Download Full-text

Algoritma Needleman-Wunsch dalam Menentukan Tingkat Kemiripan Urutan DNA Rusa Timor (Cervus timorensis) dan Rusa Merah (Cervus elaphus)

EIGEN MATHEMATICS JOURNAL ◽

10.29303/emj.v3i2.65 ◽

2020 ◽

Vol 3 (2) ◽

pp. 125

Author(s):

Hibban Kholiq ◽

Mamika Ujianita Romdhini ◽

Marliadi Susanto

Keyword(s):

Dna Sequence ◽

Sequence Alignment ◽

Cervus Elaphus ◽

Dna Sequences ◽

Sequence Data ◽

Local Alignment ◽

Base Pairs ◽

European Continent ◽

Dna Sequence Data ◽

Search Tool

Sequence alignment is a basic method in sequence analysis. This method is used to determine the similaritiy level of DNA sequences. The Needleman-Wunsch algorithm is an algorithm that can be used to solve the problem of sequence alignment. This research shows that the relation T (i, j) used in the Needleman-Wunsch algorithm is a function where T: (ℕ0 ℕ0) → ℤ. The function T (i, j) is a recursive function. Moreover, DNA sequence data used are DNA sequences from the Timor Deer, which are the identities of the provinces of West Nusa Tenggara and Red Deer, which are typical deer from the European continent as a comparison. The DNA sequence data was obtained from BLAST (Basic Local Alignment Search Tool). Based on the alignment, the most optimal alignment is obtained by forming 666 base pairs sequences with 322 matches, 230 missmatches and 114 gaps, meaning that the two DNA sequences have a 48% similarity (322/666).

Download Full-text

MPI-blastn and NCBI-TaxCollector: Improving metagenomic analysis with high performance classification and wide taxonomic attachment

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720014500139 ◽

2014 ◽

Vol 12 (03) ◽

pp. 1450013 ◽

Cited By ~ 6

Author(s):

R. Dias ◽

M. G. Xavier ◽

F. D. Rossi ◽

M. V. Neves ◽

T. A. P. Lange ◽

...

Keyword(s):

High Performance ◽

Parallel Implementation ◽

Metagenomic Analysis ◽

Local Alignment ◽

Metagenomic Sequencing ◽

Multiple Sequence ◽

Genetic Sequencing ◽

Sequencing Technologies ◽

Sequence Search ◽

Search Tool

Metagenomic sequencing technologies are advancing rapidly and the size of output data from high-throughput genetic sequencing has increased substantially over the years. This brings us to a scenario where advanced computational optimizations are requested to perform a metagenomic analysis. In this paper, we describe a new parallel implementation of nucleotide BLAST (MPI-blastn) and a new tool for taxonomic attachment of Basic Local Alignment Search Tool (BLAST) results that supports the NCBI taxonomy (NCBI-TaxCollector). MPI-blastn obtained a high performance when compared to the mpiBLAST and ScalaBLAST. In our best case, MPI-blastn was able to run 408 times faster in 384 cores. Our evaluations demonstrated that NCBI-TaxCollector is able to perform taxonomic attachments 125 times faster and needs 120 times less RAM than the previous TaxCollector. Through our optimizations, a multiple sequence search that currently takes 37 hours can be performed in less than 6 min and a post processing with NCBI taxonomic data attachment, which takes 48 hours, now is able to run in 23 min.

Download Full-text

Massively Parallel Implementation of Steered Molecular Dynamics in Tinker-HP: Comparisons of Polarizable and Non-Polarizable Simulations of Realistic Systems

10.26434/chemrxiv.7771112.v2 ◽

2019 ◽

Author(s):

Frédéric Célerse ◽

Louis Lagardere ◽

Étienne Derat ◽

Jean-Philip Piquemal

Keyword(s):

Molecular Dynamics ◽

Parallel Implementation ◽

Steered Molecular Dynamics ◽

Massively Parallel

This paper is dedicated to the massively parallel implementation of Steered Molecular Dynamics in the Tinker-HP softwtare. It allows for direct comparisons of polarizable and non-polarizable simulations of realistic systems.

Download Full-text

Massively Parallel Implementation of Steered Molecular Dynamics in Tinker-HP: Comparisons of Polarizable and Non-Polarizable Simulations of Realistic Systems

10.26434/chemrxiv.7771112 ◽

2019 ◽

Author(s):

Frédéric Célerse ◽

Louis Lagardere ◽

Étienne Derat ◽

Jean-Philip Piquemal

Keyword(s):

Molecular Dynamics ◽

Parallel Implementation ◽

Steered Molecular Dynamics ◽

Massively Parallel

This paper is dedicated to the massively parallel implementation of Steered Molecular Dynamics in the Tinker-HP softwtare. It allows for direct comparisons of polarizable and non-polarizable simulations of realistic systems.

Download Full-text

The Influence of Memory-Aware Computation on Distributed BLAST

Current Bioinformatics ◽

10.2174/1574893613666180601080811 ◽

2019 ◽

Vol 14 (2) ◽

pp. 157-163

Author(s):

Majid Hajibaba ◽

Mohsen Sharifi ◽

Saeid Gorgin

Keyword(s):

Search Time ◽

Genomic Research ◽

Local Alignment ◽

Negative Effects ◽

Sequencing Technologies ◽

Percent Improvement ◽

Fast Processing ◽

Search Tool ◽

Memory Awareness ◽

Generation Sequencing

Background: One of the pivotal challenges in nowadays genomic research domain is the fast processing of voluminous data such as the ones engendered by high-throughput Next-Generation Sequencing technologies. On the other hand, BLAST (Basic Local Alignment Search Tool), a longestablished and renowned tool in Bioinformatics, has shown to be incredibly slow in this regard. Objective: To improve the performance of BLAST in the processing of voluminous data, we have applied a novel memory-aware technique to BLAST for faster parallel processing of voluminous data. Method: We have used a master-worker model for the processing of voluminous data alongside a memory-aware technique in which the master partitions the whole data in equal chunks, one chunk for each worker, and consequently each worker further splits and formats its allocated data chunk according to the size of its memory. Each worker searches every split data one-by-one through a list of queries. Results: We have chosen a list of queries with different lengths to run insensitive searches in a huge database called UniProtKB/TrEMBL. Our experiments show 20 percent improvement in performance when workers used our proposed memory-aware technique compared to when they were not memory aware. Comparatively, experiments show even higher performance improvement, approximately 50 percent, when we applied our memory-aware technique to mpiBLAST. Conclusion: We have shown that memory-awareness in formatting bulky database, when running BLAST, can improve performance significantly, while preventing unexpected crashes in low-memory environments. Even though distributed computing attempts to mitigate search time by partitioning and distributing database portions, our memory-aware technique alleviates negative effects of page-faults on performance.

Download Full-text

Predicting bacteriophage hosts based on sequences of annotated receptor-binding proteins

Scientific Reports ◽

10.1038/s41598-021-81063-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Dimitri Boeckaerts ◽

Michiel Stock ◽

Bjorn Criel ◽

Hans Gerstmans ◽

Bernard De Baets ◽

...

Keyword(s):

Machine Learning ◽

Predictive Model ◽

Receptor Binding ◽

Bacterial Infections ◽

Sequence Data ◽

Sequence Similarity ◽

Area Under The Curve ◽

Local Alignment ◽

Search Tool ◽

Different Levels

AbstractNowadays, bacteriophages are increasingly considered as an alternative treatment for a variety of bacterial infections in cases where classical antibiotics have become ineffective. However, characterizing the host specificity of phages remains a labor- and time-intensive process. In order to alleviate this burden, we have developed a new machine-learning-based pipeline to predict bacteriophage hosts based on annotated receptor-binding protein (RBP) sequence data. We focus on predicting bacterial hosts from the ESKAPE group, Escherichia coli, Salmonella enterica and Clostridium difficile. We compare the performance of our predictive model with that of the widely used Basic Local Alignment Search Tool (BLAST). Our best-performing predictive model reaches Precision-Recall Area Under the Curve (PR-AUC) scores between 73.6 and 93.8% for different levels of sequence similarity in the collected data. Our model reaches a performance comparable to that of BLASTp when sequence similarity in the data is high and starts outperforming BLASTp when sequence similarity drops below 75%. Therefore, our machine learning methods can be especially useful in settings in which sequence similarity to other known sequences is low. Predicting the hosts of novel metagenomic RBP sequences could extend our toolbox to tune the host spectrum of phages or phage tail-like bacteriocins by swapping RBPs.

Download Full-text

Kelch 13-propeller polymorphisms in Plasmodium falciparum from Jazan region, southwest Saudi Arabia

Malaria Journal ◽

10.1186/s12936-020-03467-3 ◽

2020 ◽

Vol 19 (1) ◽

Author(s):

Ommer Mohammed Dafalla ◽

Mohammed Alzahrani ◽

Ahmed Sahli ◽

Mohammed Abdulla Al Helal ◽

Mohammad Mohammad Alhazmi ◽

...

Keyword(s):

Plasmodium Falciparum ◽

Saudi Arabia ◽

Parasite Clearance ◽

Sub Saharan Africa ◽

Local Alignment ◽

Nucleotide Polymorphisms ◽

Gene Encoding ◽

Synonymous Mutations ◽

Search Tool ◽

Sub Saharan

Abstract Background Artemisinin-based combination therapy (ACT) is recommended at the initial phase for treatment of Plasmodium falciparum, to reduce morbidity and mortality in all countries where malaria is endemic. Polymorphism in portions of P. falciparum gene encoding kelch (K13)-propeller domains is associated with delayed parasite clearance after ACT. Of about 124 different non-synonymous mutations, 46 have been identified in Southeast Asia (SEA), 62 in sub-Saharan Africa (SSA) and 16 in both the regions. This is the first study designed to analyse the prevalence of polymorphism in the P. falciparum k13-propeller domain in the Jazan region of southwest Saudi Arabia, where malaria is endemic. Methods One-hundred and forty P. falciparum samples were collected from Jazan region of southwest Saudi Arabia at three different times: 20 samples in 2011, 40 samples in 2016 and 80 samples in 2020 after the implementation of ACT. Plasmodium falciparum kelch13 (k13) gene DNA was extracted, amplified, sequenced, and analysed using a basic local alignment search tool (BLAST). Results This study obtained 51 non-synonymous (NS) mutations in three time groups, divided as follows: 6 single nucleotide polymorphisms (SNPs) ‘11.8%’ in samples collected in 2011 only, 3 (5.9%) in 2011and 2016, 5 (9.8%) in 2011 and 2020, 5 (9.8%) in 2016 only, 8 (15.7%) in 2016 and 2020, 14 (27.5%) in 2020 and 10 (19.6%) in all the groups. The BLAST revealed that the 2011 isolates were genetically closer to African isolates (53.3%) than Asian ones (46.7%). Interestingly, this proportion changed completely in 2020, to become closer to Asian isolates (81.6%) than to African ones (18.4%). Conclusions Despite the diversity of the identified mutations in the k13-propeller gene, these data did not report widespread artemisinin-resistant polymorphisms in the Jazan region where these samples were collected. Such a process would be expected to increase frequencies of mutations associated with the resistance of ACT.

Download Full-text