scholarly journals Simple adjustment of the sequence weight algorithm remarkably enhances PSI-BLAST performance

2016 ◽  
Author(s):  
Toshiyuki Oda ◽  
Kyungtaek Lim ◽  
Kentaro Tomii

AbstractPSI-BLAST, an extremely popular tool for sequence similarity search, features the utilization of Position Specific Scoring Matrix (PSSM) constructed from a multiple sequence alignment (MSA). PSSM allows the detection of more distant homologs than a general amino acid substitution matrix does. An accurate estimation of the weights of sequences in an MSA is crucially important for PSSM construction. PSI-BLAST divides a given MSA into multiple blocks, for which sequence weights are calculated. When the block width becomes very narrow, the sequence weight calculation can be difficult.We demonstrate that PSI-BLAST indeed generates a significant fraction of blocks having widths less than 5, thereby degrading the PSI-BLAST performance. We revised the code of PSI-BLAST to prevent the blocks from being narrower than a given minimum block width (MBW). We designate the modified application of PSI-BLAST as PSI-BLASTexB. When MBW is 25, PSI-BLASTexB notably outperforms PSI-BLAST consistently for three independent benchmark sets. The performance boost is even more drastic when an MSA, instead of a sequence, was used as a query.Our results demonstrate that the generation of narrow-width blocks during the sequence weight calculation is a critically important factor that restricts the PSI-BLAST search performance. By preventing narrow blocks, PSI-BLASTexB remarkably upgrades the PSI-BLAST performance.

2012 ◽  
Vol 13 (Suppl 4) ◽  
pp. S2 ◽  
Author(s):  
Emanuele Bramucci ◽  
Alessandro Paiardini ◽  
Francesco Bossa ◽  
Stefano Pascarella

Author(s):  
Sona. S Dev ◽  
P. Poornima ◽  
Akhil Venu

Eggplantor brinjal (Solanum melongena L.), is highly susceptible to various soil-borne diseases. The extensive use of chemical fungicides to combat these diseases can be minimized by identification of resistance gene analogs (RGAs) in wild species of cultivated plants.In the present study, degenerate PCR primers for the conserved regions ofnucleotide binding site-leucine rich repeat (NBS-LRR) were used to amplify RGAs from wild relatives of eggplant (Black nightshade (Solanum nigrum), Indian nightshade (Solanumviolaceum)and Solanu mincanum) which showed resistance to the bacterial wilt pathogen, Ralstonia solanacearumin the preliminary investigation. The amino acid sequence of the amplicons when compared to each other and to the amino acid sequences of known RGAs deposited in Gen Bank revealed significant sequence similarity. The phylogenetic analysis indicated that they belonged to the toll interleukin-1 receptors (TIR)-NBS-LRR type R-genes. Multiple sequence alignment with other known R genes showed significant homology with P-loop, Kinase 2 and GLPL domains of NBS-LRR class genes. There has been no report on R genes from these wild eggplants and hence the diversity analysis of these novel RGAs can lead to the identification of other novel R genes within the germplasm of different brinjal plants as well as other species of Solanum.


2020 ◽  
Vol 6 (10) ◽  
Author(s):  
Ao Li ◽  
Elisabeth Laville ◽  
Laurence Tarquis ◽  
Vincent Lombard ◽  
David Ropartz ◽  
...  

Mannoside phosphorylases are involved in the intracellular metabolization of mannooligosaccharides, and are also useful enzymes for the in vitro synthesis of oligosaccharides. They are found in glycoside hydrolase family GH130. Here we report on an analysis of 6308 GH130 sequences, including 4714 from the human, bovine, porcine and murine microbiomes. Using sequence similarity networks, we divided the diversity of sequences into 15 mostly isofunctional meta-nodes; of these, 9 contained no experimentally characterized member. By examining the multiple sequence alignments in each meta-node, we predicted the determinants of the phosphorolytic mechanism and linkage specificity. We thus hypothesized that eight uncharacterized meta-nodes would be phosphorylases. These sequences are characterized by the absence of signal peptides and of the catalytic base. Those sequences with the conserved E/K, E/R and Y/R pairs of residues involved in substrate binding would target β-1,2-, β-1,3- and β-1,4-linked mannosyl residues, respectively. These predictions were tested by characterizing members of three of the uncharacterized meta-nodes from gut bacteria. We discovered the first known β-1,4-mannosyl-glucuronic acid phosphorylase, which targets a motif of the Shigella lipopolysaccharide O-antigen. This work uncovers a reliable strategy for the discovery of novel mannoside-phosphorylases, reveals possible interactions between gut bacteria, and identifies a biotechnological tool for the synthesis of antigenic oligosaccharides.


2020 ◽  
Vol 49 (D1) ◽  
pp. D192-D200 ◽  
Author(s):  
Ioanna Kalvari ◽  
Eric P Nawrocki ◽  
Nancy Ontiveros-Palacios ◽  
Joanna Argasinska ◽  
Kevin Lamkiewicz ◽  
...  

Abstract Rfam is a database of RNA families where each of the 3444 families is represented by a multiple sequence alignment of known RNA sequences and a covariance model that can be used to search for additional members of the family. Recent developments have involved expert collaborations to improve the quality and coverage of Rfam data, focusing on microRNAs, viral and bacterial RNAs. We have completed the first phase of synchronising microRNA families in Rfam and miRBase, creating 356 new Rfam families and updating 40. We established a procedure for comprehensive annotation of viral RNA families starting with Flavivirus and Coronaviridae RNAs. We have also increased the coverage of bacterial and metagenome-based RNA families from the ZWD database. These developments have enabled a significant growth of the database, with the addition of 759 new families in Rfam 14. To facilitate further community contribution to Rfam, expert users are now able to build and submit new families using the newly developed Rfam Cloud family curation system. New Rfam website features include a new sequence similarity search powered by RNAcentral, as well as search and visualisation of families with pseudoknots. Rfam is freely available at https://rfam.org.


Author(s):  
RA Begum ◽  
MT Alam ◽  
H Jahan ◽  
MS Alam

Labeo calbasu (Family Cyprinidae) was studied at DNA level to know genetic diversity within and between species. The mitochondrial cytochrome b (cyt-b) gene of L. calbasu was sequenced and compared to the corresponding sequences of other Labeo species. DNA was isolated from the tissue sample of L. calbasu using phenol: chloroform extraction method. Forward and reverse primers were designed to amplify the target region of cytochrome b gene. A standard PCR protocol was used for the amplification of the desired region. Then, the forward and reverse sequences obtained were aligned and edited to finalize a length of 510 nucleotides which was submitted to NCBI genbank database. Nucleotide BLAST of this sequence at NCBI resulted 100% sequence similarity with L. calbasu sequence of the same region of cyt-b gene. Multiple sequence alignment of the sequence with seven more Labeo species sequences revealed 120 polymorphic sites, which have been mark of diversity among the species and might be used in molecular identification of the Labeo species. A constructed phylogenetic tree has shown relationship among the Labeo species. This research demonstrated the usefulness of mitochondrial DNA-based approach in species identification. Further, the data will provide appropriate background for studying genetic diversity within-species of the Labeo species in general and of L. calbasu in particular. J. Biodivers. Conserv. Bioresour. Manag. 2019, 5(1): 25-30


2020 ◽  
Vol 21 (S6) ◽  
Author(s):  
Sriram P. Chockalingam ◽  
Jodh Pannu ◽  
Sahar Hooshmand ◽  
Sharma V. Thankachan ◽  
Srinivas Aluru

Abstract Background Alignment-free methods for sequence comparisons have become popular in many bioinformatics applications, specifically in the estimation of sequence similarity measures to construct phylogenetic trees. Recently, the average common substring measure, ACS, and its k-mismatch counterpart, ACSk, have been shown to produce results as effective as multiple-sequence alignment based methods for reconstruction of phylogeny trees. Since computing ACSk takes O(n logkn) time and hence impractical for large datasets, multiple heuristics that can approximate ACSk have been introduced. Results In this paper, we present a novel linear-time heuristic to approximate ACSk, which is faster than computing the exact ACSk while being closer to the exact ACSk values compared to previously published linear-time greedy heuristics. Using four real datasets, containing both DNA and protein sequences, we evaluate our algorithm in terms of accuracy, runtime and demonstrate its applicability for phylogeny reconstruction. Our algorithm provides better accuracy than previously published heuristic methods, while being comparable in its applications to phylogeny reconstruction. Conclusions Our method produces a better approximation for ACSk and is applicable for the alignment-free comparison of biological sequences at highly competitive speed. The algorithm is implemented in Rust programming language and the source code is available at https://github.com/srirampc/adyar-rs.


Plant Disease ◽  
2007 ◽  
Vol 91 (11) ◽  
pp. 1413-1418 ◽  
Author(s):  
Kanchan Nasare ◽  
Amit Yadav ◽  
Anil K. Singh ◽  
K. B. Shivasharanappa ◽  
Y. S. Nerkar ◽  
...  

A total of 240 sugarcane (Saccharum officinarum) plants showing phenotypic symptoms of sugarcane grassy shoot (SCGS) disease were collected from three states of India, Maharashtra, Karnataka, and Uttar Pradesh. Phytoplasmas were detected in all symptomatic samples by the polymerase chain reaction (PCR) amplification of phytoplasma-specific 16S rRNA gene and 16S-23S rRNA spacer region (SR) sequences. No amplification was observed when DNA from asymptomatic plant samples was used as a template. Sixteen samples were selected on the basis of phenotypic symptoms and geographic location, and cloning and sequencing of the 16S rRNA and spacer regions were performed. Multiple sequence alignments of the 16S rRNA sequences revealed that they share very high sequence similarity with phytoplasmas of rice yellow dwarf, 16SrXI. However, the 16S-23S rRNA SR sequence analysis revealed that while the majority of phytoplasmas shared very high (>99%) sequence similarity with previously reported sugarcane phytoplasmas, two of them, namely BV2 (DQ380342) and VD7 (DQ380343), shared relatively low sequence similarity (79 and 84%, respectively). Therefore, these two phytoplasmas may be previously unreported ones that cause significant yield losses in sugarcane in India.


2021 ◽  
Vol 11 ◽  
Author(s):  
Haipeng Shi ◽  
Haihe Shi ◽  
Shenghua Xu

As a key algorithm in bioinformatics, sequence alignment algorithm is widely used in sequence similarity analysis and genome sequence database search. Existing research focuses mainly on the specific steps of the algorithm or is for specific problems, lack of high-level abstract domain algorithm framework. Multiple sequence alignment algorithms are more complex, redundant, and difficult to understand, and it is not easy for users to select the appropriate algorithm; some computing errors may occur. Based on our constructed pairwise sequence alignment algorithm component library and the convenient software platform PAR, a few expansion domain components are developed for multiple sequence alignment application domain, and specific multiple sequence alignment algorithm can be designed, and its corresponding program, i.e., C++/Java/Python program, can be generated efficiently and thus enables the improvement of the development efficiency of complex algorithms, as well as accuracy of sequence alignment calculation. A star alignment algorithm is designed and generated to demonstrate the development process.


2019 ◽  
Vol 12 (1) ◽  
pp. 205979911984098
Author(s):  
Nathan J Bahr ◽  
S Herzberg ◽  
W Lambert ◽  
M Hansen ◽  
JJ McNulty ◽  
...  

Our objective was to model process variation of Emergency Medical Service teams responding to simulated pediatric emergencies and determine if sequence alignment distinguishes performance quality. We performed a retrospective process analysis by watching and coding activities in videos from standardized simulations of 42 Emergency Medical Service teams. Teams were classified into high- or low-performing groups based on the Clinical Teamwork Scale™. Activities were coded according to resuscitation tasks, performer, and times. We used ClustalG to align task sequences within and between groups, and measured similarity. Teams within and between performance levels had an average sequence similarity of 52 ± 7% and 50 ± 7%. Teams performed clinically appropriate tasks that varied in prioritization, for example, performing compressions or connecting the EKG monitor early. There was no statistical difference in gross similarity between groups but specific differences in prioritization may have had clinically meaningful implications. Alignment could improve by accounting for task duration and concurrency.


2022 ◽  
Author(s):  
Shoichi Sakaguchi ◽  
Syun-ichi Urayama ◽  
Yoshihiro Takaki ◽  
Hong Wu ◽  
Youichi Suzuki ◽  
...  

RNA viruses are distributed in various environments, and most RNA viruses have been recently identified by metatranscriptome sequencing. However, due to the high nucleotide diversity of RNA viruses, it is still challenging to identify their sequences. Therefore, this study generated a dataset of RNA-dependent RNA polymerase (RdRp) domains essential for all RNA viruses belonging to Orthornavirae. Also, the collected genes with RdRp domains from various RNA viruses were clustered by amino acid sequence similarity. For each cluster, a multiple sequence alignment was generated, and a hidden Markov model (HMM) profile was created if the number of sequences was greater than five. Using the 1,467 HMM profiles, we detected RdRp domains in the RefSeq RNA virus sequences, combined the hit sequences with the RdRp domains, and reconstructed the HMM profiles. As a result, 2,234 HMM profiles were generated from 12,316 RdRp domain sequences, and the dataset was named NeoRdRp. Additionally, using the UniProt dataset, we confirmed that almost all NeoRdRp HMM profiles could specifically detect RdRps in Orthornavirae. Furthermore, we compared the NeoRdRp dataset with two previously reported RNA virus detection methods to detect RNA virus sequences from metatranscriptome sequencing data. Our methods can identify most of the RNA viruses in the datasets; however, some RNA viruses were not detected, similar to the other two methods. The NeoRdRp can be improved by repeatedly adding new RdRp sequences and can be expected to be widely applied as a system for detecting various RNA viruses from metatranscriptome data.


Sign in / Sign up

Export Citation Format

Share Document