protein sequence database
Recently Published Documents


TOTAL DOCUMENTS

43
(FIVE YEARS 1)

H-INDEX

17
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Qinze Yu ◽  
Zhihang Dong ◽  
Xingyu Fan ◽  
Licheng Zong ◽  
Yu Li

Identifying the targets of an antimicrobial peptide is a fundamental step in studying the innate immuneresponse and combating antibiotic resistance, and more broadly, precision medicine and public health. Therehave been extensive studies on the statistical and computational approaches to identify (i) whether a peptide is anantimicrobial peptide (AMP) or a non-AMP and (ii) which targets are these sequences effective to (Gram-positive,Gram-negative, etc.). Despite the existing deep learning methods on this problem, most of them are unable tohandle the small AMP classes (anti-insect, anti-parasite, etc.). And more importantly, some AMPs can havemultiple targets, which the previous methods fail to consider. In this study, we build a diverse and comprehensivemulti-label protein sequence database by collecting and cleaning amino acids from various AMP databases.To generate efficient representations and features for the small classes dataset, we take advantage of a proteinlanguage model trained on 250 million protein sequences. Based on that, we develop an end-to-end hierarchicalmulti-label deep forest framework, HMD-AMP, to annotate AMP comprehensively. After identifying an AMP, itfurther predicts what targets the AMP can effectively kill from eleven available classes. Extensive experimentssuggest that our framework outperforms state-of-the-art models in both the binary classification task and themulti-label classification task, especially on the minor classes. Compared with the previous deep learning methods,our method improves the performance on macro-AUROC by 11%. The model is robust against reduced featuresand small perturbations and produces promising results. We believe HMD-AMP contribute to both the future wet-lab investigations of the innate structural properties of different antimicrobial peptides and build promising empirical underpinnings for precise medicine with antibiotics.


2019 ◽  
Vol 19 (1) ◽  
pp. 198-208 ◽  
Author(s):  
Richard S. Johnson ◽  
Brian C. Searle ◽  
Brook L. Nunn ◽  
Jason M. Gilmore ◽  
Molly Phillips ◽  
...  

2019 ◽  
Author(s):  
Praveen Kumar ◽  
James E. Johnson ◽  
Caleb Easterly ◽  
Subina Mehta ◽  
Ray Sajulga ◽  
...  

AbstractMulti-omics approaches focused on mass-spectrometry (MS)-based data, such as metaproteomics, utilize genomic and/or transcriptomic sequencing data to generate a comprehensive protein sequence database. These databases can be very large, containing millions of sequences, which reduces the sensitivity of matching tandem mass spectrometry (MS/MS) data to sequences to generate peptide spectrum matches (PSMs). Here, we describe a sectioning method for generating an enriched database for those protein sequences that are most likely present in the sample. Our evaluation demonstrates how this method helps to increase the sensitivity of PSMs while maintaining acceptable false discovery rate statistics. We demonstrate increased true positive PSM identifications using the sectioning method when compared to the traditional large database searching method, whereas it helped in reducing the false PSM identifications when compared to a previously described two-step method for reducing database size. The sectioning method for large sequence databases enables generation of an enriched protein sequence database and promotes increased sensitivity in identifying PSMs, while maintaining acceptable and manageable FDR. Furthermore, implementation in the Galaxy platform provides access to a usable and automated workflow for carrying out the method. Our results show the utility of this methodology for a wide-range of applications where genome-guided, large sequence databases are required for MS-based proteomics data analysis.


2019 ◽  
Vol 41 (3) ◽  
Author(s):  
Nguyen Van Tung ◽  
Nguyen Huy Hoang ◽  
Nguyen Kim Thoa

Cytochrome P450s (CYPs) are one of the largest distributed enzymes, which catalyze more than 20 different reactions. At present, there has been an increasing realization of the power of P450 biocatalysts for the industrial synthesis of pharmaceuticals, agrochemicals, bulk chemicals, food ingredients, etc. On the other hand, the conditions of industrial processes at high temperature, high-pressure or in chemical solvent require the enzymes, which catalyze the bioconversion, have a specific properties such as thermostability, chemical tolerance or barophilicity. Up to date, the number of thermostable P450s is limited. Nowadays, DNA-metagenome technique gives us a chance to catch novel genes and unique interesting enzymes from microbial community in certain ecology. In this paper, metagenomic DNA extracted from water samples from Binh Chau hot spring was sequenced using Illumila’s HiSeq platform and was analysed to mining putative genes encoding cytochrome P450. The sequencing generated 9.4 Gb of reads containing 156,093 putative ORFs, of these, 106,903 genes were annotated in NCBI non-redundant protein sequence database. Among all the ORFs were annotated, 68 putative ORFs encoding cytochrome P450 were found belong to 36 specific groups of cytochrome P450 protein family. Of these, the melting temperature (Tm) from thirty-six completed ORFs was predicted for a better understanding of thermodynamic stability. 


2019 ◽  
Vol 41 (3) ◽  
Author(s):  
Nguyen Van Tung ◽  
Nguyen Huy Hoang ◽  
Nguyen Kim Thoa

Cytochrome P450s (CYPs) are one of the largest distributed enzymes, which catalyze more than 20 different reactions. At present, there has been an increasing realization of the power of P450 biocatalysts for the industrial synthesis of pharmaceuticals, agrochemicals, bulk chemicals, food ingredients, etc. On the other hand, the conditions of industrial processes at high temperature, high-pressure or in chemical solvent require the enzymes, which catalyze the bioconversion, have a specific properties such as thermostability, chemical tolerance or barophilicity. Up to date, the number of thermostable P450s is limited. Nowadays, DNA-metagenome technique gives us a chance to catch novel genes and unique interesting enzymes from microbial community in certain ecology. In this paper, metagenomic DNA extracted from water samples from Binh Chau hot spring was sequenced using Illumila’s HiSeq platform and was analysed to mining putative genes encoding cytochrome P450. The sequencing generated 9.4 Gb of reads containing 156,093 putative ORFs, of these, 106,903 genes were annotated in NCBI non-redundant protein sequence database. Among all the ORFs were annotated, 68 putative ORFs encoding cytochrome P450 were found belong to 36 specific groups of cytochrome P450 protein family. Of these, the melting temperature (Tm) from thirty-six completed ORFs was predicted for a better understanding of thermodynamic stability. 


2019 ◽  
Vol 47 (W1) ◽  
pp. W402-W407 ◽  
Author(s):  
Daniel W A Buchan ◽  
David T Jones

Abstract The PSIPRED Workbench is a web server offering a range of predictive methods to the bioscience community for 20 years. Here, we present the work we have completed to update the PSIPRED Protein Analysis Workbench and make it ready for the next 20 years. The main focus of our recent website upgrade work has been the acceleration of analyses in the face of increasing protein sequence database size. We additionally discuss any new software, the new hardware infrastructure, our webservices and web site. Lastly we survey updates to some of the key predictive algorithms available through our website.


2017 ◽  
Vol 163 ◽  
pp. 67-75 ◽  
Author(s):  
Sophie Bromilow ◽  
Lee A. Gethings ◽  
Mike Buckley ◽  
Mike Bromley ◽  
Peter R. Shewry ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document