Predicting secondary structures, contact numbers, and residue-wise contact orders of native protein structures from amino acid sequences using critical random networks

Amino Acid Network for the Discrimination of Native Protein Structures from Decoys

Current Protein and Peptide Science ◽

10.2174/1389203715666140724084709 ◽

2014 ◽

Vol 15 (6) ◽

pp. 522-528 ◽

Cited By ~ 10

Author(s):

Jianhong Zhou ◽

Wenying Yan ◽

Guang Hu ◽

Bairong Shen

Keyword(s):

Amino Acid ◽

Protein Structures ◽

Native Protein

Download Full-text

In-silicoprediction and modeling of theEntamoeba histolyticaproteins: Serine-richEntamoeba histolyticaprotein and 29 kDa Cysteine-rich protease

PeerJ ◽

10.7717/peerj.3160 ◽

2017 ◽

Vol 5 ◽

pp. e3160 ◽

Cited By ~ 5

Author(s):

Kumar Manochitra ◽

Subhash Chandra Parija

Keyword(s):

Amino Acid ◽

Structure Prediction ◽

Tertiary Structure ◽

Protein Structures ◽

Amino Acid Sequences ◽

Treatment Modalities ◽

Bioinformatic Tools ◽

Complex Protein ◽

A Cell ◽

Quaternary Structures

BackgroundAmoebiasis is the third most common parasitic cause of morbidity and mortality, particularly in countries with poor hygienic settings. There exists an ambiguity in the diagnosis of amoebiasis, and hence there arises a necessity for a better diagnostic approach. Serine-richEntamoeba histolyticaprotein (SREHP), peroxiredoxin and Gal/GalNAc lectin are pivotal inE. histolyticavirulence and are extensively studied as diagnostic and vaccine targets. For elucidating the cellular function of these proteins, details regarding their respective quaternary structures are essential. However, studies in this aspect are scant. Hence, this study was carried out to predict the structure of these target proteins and characterize them structurally as well as functionally using appropriatein-silicomethods.MethodsThe amino acid sequences of the proteins were retrieved from National Centre for Biotechnology Information database and aligned using ClustalW. Bioinformatic tools were employed in the secondary structure and tertiary structure prediction. The predicted structure was validated, and final refinement was carried out.ResultsThe protein structures predicted by i-TASSER were found to be more accurate than Phyre2 based on the validation using SAVES server. The prediction suggests SREHP to be an extracellular protein, peroxiredoxin a peripheral membrane protein while Gal/GalNAc lectin was found to be a cell-wall protein. Signal peptides were found in the amino-acid sequences of SREHP and Gal/GalNAc lectin, whereas they were not present in the peroxiredoxin sequence. Gal/GalNAc lectin showed better antigenicity than the other two proteins studied. All the three proteins exhibited similarity in their structures and were mostly composed of loops.DiscussionThe structures of SREHP and peroxiredoxin were predicted successfully, while the structure of Gal/GalNAc lectin could not be predicted as it was a complex protein composed of sub-units. Also, this protein showed less similarity with the available structural homologs. The quaternary structures of SREHP and peroxiredoxin predicted from this study would provide better structural and functional insights into these proteins and may aid in development of newer diagnostic assays or enhancement of the available treatment modalities.

Download Full-text

Study on the Influence of mRNA, the Genetic Language, on Protein Folding Rates

Frontiers in Genetics ◽

10.3389/fgene.2021.635250 ◽

2021 ◽

Vol 12 ◽

Author(s):

Ruifang Li ◽

Hong Li ◽

Xue Feng ◽

Ruifeng Zhao ◽

Yongxia Cheng

Keyword(s):

Protein Folding ◽

Amino Acid ◽

Protein Structures ◽

Gc Content ◽

Amino Acid Sequences ◽

Information Redundancy ◽

Folding Rate ◽

Folding Rates ◽

Related Information ◽

Linear Regressions

Many works have reported that protein folding rates are influenced by the characteristics of amino acid sequences and protein structures. However, few reports on the problem of whether the corresponding mRNA sequences are related to the protein folding rates can be found. An mRNA sequence is regarded as a kind of genetic language, and its vocabulary and phraseology must provide influential information regarding the protein folding rate. In the present work, linear regressions on the parameters of the vocabulary and phraseology of mRNA sequences and the corresponding protein folding rates were analyzed. The results indicated that D2 (the adjacent base-related information redundancy) values and the GC content values of the corresponding mRNA sequences exhibit significant negative relations with the protein folding rates, but D1 (the single base information redundancy) values exhibit significant positive relations with the protein folding rates. In addition, the results show that the relationships between the parameters of the genetic language and the corresponding protein folding rates are obviously different for different protein groups. Some useful parameters that are related to protein folding rates were found. The results indicate that when predicting protein folding rates, the information from protein structures and their amino acid sequences is insufficient, and some information for regulating the protein folding rates must be derived from the mRNA sequences.

Download Full-text

Intergenic ORFs as elementary structural modules of de novo gene birth and protein evolution

10.1101/2021.04.13.439703 ◽

2021 ◽

Author(s):

Chris Papadopoulos ◽

Isabelle Callebaut ◽

Jean-Christophe Gelly ◽

Isabelle Hatin ◽

Olivier Namy ◽

...

Keyword(s):

Protein Structure ◽

Amino Acid ◽

De Novo ◽

Protein Structures ◽

Structural Diversity ◽

Building Blocks ◽

Amino Acid Sequences ◽

Novel Genes ◽

Noncoding Sequences ◽

De Novo Gene

The noncoding genome plays an important role in de novo gene birth and in the emergence of genetic novelty. Nevertheless, how noncoding sequences' properties could promote the birth of novel genes and shape the evolution and the structural diversity of proteins remains unclear. Therefore, by combining different bioinformatic approaches, we characterized the fold potential diversity of the amino acid sequences encoded by all intergenic ORFs (Open Reading Frames) of S. cerevisiae with the aim of (i) exploring whether the large structural diversity observed in proteomes is already present in noncoding sequences, and (ii) estimating the potential of the noncoding genome to produce novel protein bricks that can either give rise to novel genes or be integrated into pre-existing proteins, thus participating in protein structure diversity and evolution. We showed that amino acid sequences encoded by most yeast intergenic ORFs contain the elementary building blocks of protein structures. Moreover, they encompass the large structural diversity of canonical proteins with strikingly the majority predicted as foldable. Then, we investigated the early stages of de novo gene birth by identifying intergenic ORFs with a strong translation signal in ribosome profiling experiments and by reconstructing the ancestral sequences of 70 yeast de novo genes. This enabled us to highlight sequence and structural factors determining de novo gene emergence. Finally, we showed a strong correlation between the fold potential of de novo proteins and the one of their ancestral amino acid sequences, reflecting the relationship between the noncoding genome and the protein structure universe.

Download Full-text

IntFOLD: an integrated server for modelling protein structures and functions from amino acid sequences: Figure 1.

Nucleic Acids Research ◽

10.1093/nar/gkv236 ◽

2015 ◽

Vol 43 (W1) ◽

pp. W169-W173 ◽

Cited By ~ 72

Author(s):

Liam J. McGuffin ◽

Jennifer D. Atkins ◽

Bajuna R. Salehe ◽

Ahmad N. Shuid ◽

Daniel B. Roche

Keyword(s):

Amino Acid ◽

Protein Structures ◽

Amino Acid Sequences

Download Full-text

A structural homology approach for computational protein design with flexible backbone

Bioinformatics ◽

10.1093/bioinformatics/bty975 ◽

2018 ◽

Vol 35 (14) ◽

pp. 2418-2426 ◽

Cited By ~ 2

Author(s):

David Simoncini ◽

Kam Y J Zhang ◽

Thomas Schiex ◽

Sophie Barbe

Keyword(s):

Amino Acid ◽

Protein Design ◽

Protein Sequence ◽

Critical Role ◽

Protein Structures ◽

Amino Acid Sequences ◽

Computational Protein Design ◽

Supplementary Information ◽

Structural Homology ◽

Homologous Proteins

Abstract Motivation Structure-based Computational Protein design (CPD) plays a critical role in advancing the field of protein engineering. Using an all-atom energy function, CPD tries to identify amino acid sequences that fold into a target structure and ultimately perform a desired function. Energy functions remain however imperfect and injecting relevant information from known structures in the design process should lead to improved designs. Results We introduce Shades, a data-driven CPD method that exploits local structural environments in known protein structures together with energy to guide sequence design, while sampling side-chain and backbone conformations to accommodate mutations. Shades (Structural Homology Algorithm for protein DESign), is based on customized libraries of non-contiguous in-contact amino acid residue motifs. We have tested Shades on a public benchmark of 40 proteins selected from different protein families. When excluding homologous proteins, Shades achieved a protein sequence recovery of 30% and a protein sequence similarity of 46% on average, compared with the PFAM protein family of the target protein. When homologous structures were added, the wild-type sequence recovery rate achieved 93%. Availability and implementation Shades source code is available at https://bitbucket.org/satsumaimo/shades as a patch for Rosetta 3.8 with a curated protein structure database and ITEM library creation software. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

NEAT-FLEX: Predicting the conformational flexibility of amino acids using neuroevolution of augmenting topologies

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720017500093 ◽

2017 ◽

Vol 15 (03) ◽

pp. 1750009 ◽

Cited By ~ 3

Author(s):

Bruno Grisci ◽

Márcio Dorn

Keyword(s):

Amino Acid ◽

Amino Acid Sequence ◽

Tertiary Structure ◽

Structural Information ◽

Protein Structures ◽

Three Dimensional ◽

Conformational Flexibility ◽

Amino Acid Sequences ◽

Conformational Search ◽

Back Propagation Algorithm

The development of computational methods to accurately model three-dimensional protein structures from sequences of amino acid residues is becoming increasingly important to the structural biology field. This paper addresses the challenge of predicting the tertiary structure of a given amino acid sequence, which has been reported to belong to the NP-Complete class of problems. We present a new method, namely NEAT–FLEX, based on NeuroEvolution of Augmenting Topologies (NEAT) to extract structural features from (ABS) proteins that are determined experimentally. The proposed method manipulates structural information from the Protein Data Bank (PDB) and predicts the conformational flexibility (FLEX) of residues of a target amino acid sequence. This information may be used in three-dimensional structure prediction approaches as a way to reduce the conformational search space. The proposed method was tested with 24 different amino acid sequences. Evolving neural networks were compared against a traditional error back-propagation algorithm; results show that the proposed method is a powerful way to extract and represent structural information from protein molecules that are determined experimentally.

Download Full-text

Local repulsion in protein structures as revealed by a charge distribution analysis of all amino acid sequences from theSaccharomyces cerevisiaegenome

Journal of Physics Condensed Matter ◽

10.1088/0953-8984/17/31/007 ◽

2005 ◽

Vol 17 (31) ◽

pp. S2825-S2831 ◽

Cited By ~ 3

Author(s):

Runcong Ke ◽

Shigeki Mitaku

Keyword(s):

Amino Acid ◽

Charge Distribution ◽

Protein Structures ◽

Amino Acid Sequences ◽

Distribution Analysis ◽

A Charge

Download Full-text

Expression and characterisation of the ryegrass mottle virus non-structural proteins

Proceedings of the Latvian Academy of Sciences Section B Natural Exact and Applied Sciences ◽

10.2478/v10046-010-0035-4 ◽

2010 ◽

Vol 64 (5-6) ◽

pp. 215-222

Author(s):

Ina Baļķe ◽

Gunta Resēviča ◽

Dace Skrastiņa ◽

Andris Zeltiņš

Keyword(s):

Amino Acid ◽

Expression System ◽

Protein Structures ◽

Amino Acid Sequences ◽

Host Cells ◽

Structural Proteins ◽

Mottle Virus ◽

Expression Vectors ◽

Structural Protein ◽

E Coli

Expression and characterisation of the ryegrass mottle virus non-structural proteins The Ryegrass mottle virus (RGMoV) single-stranded RNA genome is organised into four open reading frames (ORF) which encode several proteins: ORF1 encodes protein P1, ORF2a contains the membrane-associated 3C-like serine protease, genome-linked protein VPg and a P16 protein gene. ORF2b encodes replicase RdRP and the only structural protein, coat protein, is synthesised from ORF3. To obtain the non-structural proteins in preparative quantities and to characterise them, the corresponding RGMoV gene cDNAs were cloned in pET- and pColdI-derived expression vectors and overexpressed in several E. coli host cells. For protease and RdRP, the best expression system containing pColdI vector and E. coli WK6 strain was determined. VPg and P16 proteins were obtained from the pET- or pACYC- vectors and E. coli BL21 (DE3) host cells and purified using Ni-Sepharose affinity chromatography. Attempts to crystallize VPg and P16 were unsuccessful, possibly due to non-structured amino acid sequences in both protein structures. Methods based on bioinformatic analysis indicated that the entire VPg domain and the C-terminal part of the P16 contain unstructured amino acid stretches, which possibly prevented the formation of crystals.

Download Full-text

Integrated displays of aligned amino acid sequences and protein structures

Bioinformatics ◽

10.1093/bioinformatics/7.3.341 ◽

1991 ◽

Vol 7 (3) ◽

pp. 341-346

Author(s):

Raimund Schnobel

Keyword(s):

Amino Acid ◽

Protein Structures ◽

Amino Acid Sequences

Download Full-text