scholarly journals Application of genetic semihomology algorithm to theoretical studies on various protein families.

2001 ◽  
Vol 48 (1) ◽  
pp. 21-33 ◽  
Author(s):  
J Leluk ◽  
B Hanus-Lorenz ◽  
A F Sikorski

Several protein families of different nature were studied for genetic relationship, correct alignment at non-homologous fragments, optimal sequence consensus construction, and confirmation of their actual relevance. A comparison of the genetic semihomology approach with statistical approaches indicates a high accuracy and cognition significance of the former. This is particularly pronounced in the study of related proteins that show a low degree of homology. The sequence multiple alignments were verified and corrected with respect to the questionable, non-homologous fragments. The verified alignments were the basis for consensus sequence formation. The frequency of six-codon amino acids occurrence versus position variability was studied and their possible role in amino acid mutational exchange at variable positions is discussed.

2017 ◽  
Vol 28 (19) ◽  
pp. 2461-2469 ◽  
Author(s):  
Patrick R. Stoddard ◽  
Tom A. Williams ◽  
Ethan Garner ◽  
Buzz Baum

While many are familiar with actin as a well-conserved component of the eukaryotic cytoskeleton, it is less often appreciated that actin is a member of a large superfamily of structurally related protein families found throughout the tree of life. Actin-related proteins include chaperones, carbohydrate kinases, and other enzymes, as well as a staggeringly diverse set of proteins that use the energy from ATP hydrolysis to form dynamic, linear polymers. Despite differing widely from one another in filament structure and dynamics, these polymers play important roles in ordering cell space in bacteria, archaea, and eukaryotes. It is not known whether these polymers descended from a single ancestral polymer or arose multiple times by convergent evolution from monomeric actin-like proteins. In this work, we provide an overview of the structures, dynamics, and functions of this diverse set. Then, using a phylogenetic analysis to examine actin evolution, we show that the actin-related protein families that form polymers are more closely related to one another than they are to other nonpolymerizing members of the actin superfamily. Thus all the known actin-like polymers are likely to be the descendants of a single, ancestral, polymer-forming actin-like protein.


2019 ◽  
Vol 2019 ◽  
pp. 1-22 ◽  
Author(s):  
Sorana D. Bolboacă

Diagnostic tests are approaches used in clinical practice to identify with high accuracy the disease of a particular patient and thus to provide early and proper treatment. Reporting high-quality results of diagnostic tests, for both basic and advanced methods, is solely the responsibility of the authors. Despite the existence of recommendation and standards regarding the content or format of statistical aspects, the quality of what and how the statistic is reported when a diagnostic test is assessed varied from excellent to very poor. This article briefly reviews the steps in the evaluation of a diagnostic test from the anatomy, to the role in clinical practice, and to the statistical methods used to show their performances. The statistical approaches are linked with the phase, clinical question, and objective and are accompanied by examples. More details are provided for phase I and II studies while the statistical treatment of phase III and IV is just briefly presented. Several free online resources useful in the calculation of some statistics are also given.


2019 ◽  
Vol 47 (W1) ◽  
pp. W308-W314 ◽  
Author(s):  
Dmitry Suplatov ◽  
Daria Timonina ◽  
Yana Sharapova ◽  
Vytas Švedas

Abstract Disulfide bonds play a significant role in protein stability, function or regulation but are poorly conserved among evolutionarily related proteins. The Yosshi can help to understand the role of S–S bonds by comparing sequences and structures of homologs with diverse properties and different disulfide connectivity patterns within a common structural fold of a superfamily, and assist to select the most promising hot-spots to improve stability of proteins/enzymes or modulate their functions by introducing naturally occurring crosslinks. The bioinformatic analysis is supported by the integrated Mustguseal web-server to construct large structure-guided sequence alignments of functionally diverse protein families that can include thousands of proteins based on all available information in public databases. The Yosshi+Mustguseal is a new integrated web-tool for a systematic homology-driven analysis and engineering of S–S bonds that facilitates a broader interpretation of disulfides not just as a factor of structural stability, but rather as a mechanism to implement functional diversity within a superfamily. The results can be downloaded as a content-rich PyMol session file or further studied online using the HTML5-based interactive analysis tools. Both web-servers are free and open to all users at https://biokinet.belozersky.msu.ru/yosshi and there is no login requirement.


2004 ◽  
Vol 02 (04) ◽  
pp. 719-745 ◽  
Author(s):  
ARUN SIDDHARTH KONAGURTHU ◽  
JAMES WHISSTOCK ◽  
PETER J. STUCKEY

In this paper we demonstrate a practical approach to construct progressive multiple alignments using sequence triplet optimizations rather than a conventional pairwise approach. Using the sequence triplet alignments progressively provides a scope for the synthesis of a three-residue exchange amino acid substitution matrix. We develop such a 20×20×20 matrix for the first time and demonstrate how its use in optimal sequence triplet alignments increases the sensitivity of building multiple alignments. Various comparisons were made between alignments generated using the progressive triplet methods and the conventional progressive pairwise procedure. The assessment of these data reveal that, in general, the triplet based approaches generate more accurate sequence alignments than the traditional pairwise based procedures, especially between more divergent sets of sequences.


1993 ◽  
Vol 21 (3) ◽  
pp. 597-604 ◽  
Author(s):  
John P. Overington ◽  
Zhan-Yang Zhu ◽  
Andrej Šali ◽  
Mark S. Johnson ◽  
Ramanathan Sowdhamini ◽  
...  

2021 ◽  
Vol 17 (4) ◽  
pp. e1008798
Author(s):  
Claudio Bassot ◽  
Arne Elofsson

Repeat proteins are abundant in eukaryotic proteomes. They are involved in many eukaryotic specific functions, including signalling. For many of these proteins, the structure is not known, as they are difficult to crystallise. Today, using direct coupling analysis and deep learning it is often possible to predict a protein’s structure. However, the unique sequence features present in repeat proteins have been a challenge to use direct coupling analysis for predicting contacts. Here, we show that deep learning-based methods (trRosetta, DeepMetaPsicov (DMP) and PconsC4) overcomes this problem and can predict intra- and inter-unit contacts in repeat proteins. In a benchmark dataset of 815 repeat proteins, about 90% can be correctly modelled. Further, among 48 PFAM families lacking a protein structure, we produce models of forty-one families with estimated high accuracy.


2020 ◽  
Author(s):  
Sebastian Keller ◽  
Pauli Miettinen ◽  
Olga V. Kalinina

AbstractIdentification of biologically relevant motifs in proteins is a long-standing problem in bioinformatics, especially when considering distantly related proteins where sequence analysis alone becomes increasingly difficult. Here we present a novel approach to identify such motifs in protein three-dimensional structures without depending on sequence alignment by representing structures as graphs in the form of residue interaction networks and employing a modified frequent subgraph mining algorithm. These networks represent residues as vertices while contacts between residues are denoted by edges labeled with Euclidean distances. We use frequent subgraph mining to determine all subgraphs that are subgraph isomorphic to, i.e. are contained in, at least a given number of such networks generated from structures in the same protein family. For this we introduce two extensions of the classical frequent subgraph mining: approximate matching of distance-based labels to account for small variations between protein structures and scoring as well as score-based filtering of subgraphs in order to identify structurally conserved motifs and to counteract the expanding size of the search space. This approach was then validated by demonstrating that it can rediscover previously characterized functionally important structural motifs in selected protein families. For further validation we show that it is also able to identify motifs that correspond to patterns in the PROSITE database. We then applied our approach to all superfamilies in the SCOP database and found an enrichment of residues in the ligand binding site in the discovered motifs evidencing their functional importance. Finally we use the approach to discover a novel structural motif in jelly-roll capsid proteins found in members of the picornavirus-like superfamily. This is presented together with an efficient open source implementation of the algorithm called RINminer.Author summaryAs the evolutionary distance between proteins increases, their sequence identity drops rapidly, whereas functionally important sequence motifs and three-dimensional (3D) structural scaffold, in which they are embedded, are more conserved. We developed an approach that automatically identifies such motifs by converting protein 3D structures into a set of graphs and then employing the frequent subgraph mining framework. In these graphs, residues are represented as vertices, and if two residues interact in the corresponding protein 3D structure, they are connected by an edge labeled with the Euclidean distance between the residues. In the classical setting of frequent subgraph mining, all subgraphs from a database of graphs are enumerated and the ones that are exactly found, i.e. are subgraph isomorphic, in more than a certain number of graphs are listed as supported. Our approach introduces two new concepts: approximately isomorphic subgraphs and an efficient scoring scheme that allows to retain only biologically relevant subgraph in the enumeration step. Approximate isomorphism allows edge labels not to match exactly, and thus account for natural deviations between 3D structures of related proteins. With our approach, we were able to automatically rediscover known motifs from PROSITE, as well as in three well-studied extremely diverse protein families. We predicted functionally important residues in SCOP superfamilies and demonstrated that they tend to lie in structurally meaningful regions: ligand-binding sites and protein core. Additionally, we present a previously unreported structural motif in jelly-roll viral capsids.


Author(s):  
Michael G Tassia ◽  
Kyle T David ◽  
James P Townsend ◽  
Kenneth M Halanych

Abstract Sequence annotation is fundamental for studying the evolution of protein families, particularly when working with non-model species. Given the rapid, ever-increasing number of species receiving high-quality genome sequencing, accurate domain modeling that is representative of species diversity is crucial for understanding protein family sequence evolution and their inferred function(s). Here, we describe a bioinformatic tool called TIAMMAt ( Taxon-Informed Adjustment of Markov Model Attributes) which revises domain profile hidden Markov models (HMMs) by incorporating homologous domain sequences from underrepresented and non-model species. Using innate immunity pathways as a case study, we show that revising profile HMM parameters to directly account for variation in homologs among underrepresented species provides valuable insight into the evolution of protein families. Following adjustment by TIAMMAt, domain profile HMMs exhibit changes in their per-site amino acid state emission probabilities and insertion/deletion probabilities while maintaining the overall structure of the consensus sequence. Our results show that domain revision can heavily impact evolutionary interpretations for some families (i.e., NLR’s NACHT domain), whereas impact on other domains (e.g., rel homology domain and interferon regulatory factor domains) is minimal due to high levels of sequence conservation across the sampled phylogenetic depth (i.e., Metazoa). Importantly, TIAMMAt revises target domain models to reflect homologous sequence variation using the taxonomic distribution under consideration by the user. TIAMMAt’s flexibility to revise any subset of the Pfam database using a user-defined taxonomic pool will make it a valuable tool for future protein evolution studies, particularly when incorporating (or focusing) on non-model species.


Development ◽  
1994 ◽  
Vol 1994 (Supplement) ◽  
pp. 27-33 ◽  
Author(s):  
Cyrus Chothia

The evolution of development involves the development of new proteins. Estimates based on the initial results of the genome projects, and on the data banks of protein sequences and structures, suggest that the large majority of proteins come from no more than one thousand families. Members of a family are descended from a common ancestor. Protein families evolve by gene duplication and mutation. Mutations change the conformation of the peripheral regions of proteins; i.e. the regions that are involved, at least in part, in their function. If mutations proceed until only 20% of the residues in related proteins are identical, it is common for the conformational changes to affect half the structure. Most of the proteins involved in the interactions of cells, and in their assembly to form multicellular organisms, are mosaic proteins. These are large and have a modular structure, in that they are built of sets of homologous domains that are drawn from a relatively small number of protein families. Patthy's model for the evolution of mosaic proteins describes how they arose through the insertion of introns into genes, gene duplications and intronic recombination. The rates of progress in the genome sequencing projects, and in protein structure analyses, means that in a few years we will have a fairly complete outline description of the molecules responsible for the structure and function of organisms at several different levels of developmental complexity. This should make a major contribution to our understanding of the evolution of development.


Sign in / Sign up

Export Citation Format

Share Document