Application of genetic semihomology algorithm to theoretical studies on various protein families.

J Leluk; B Hanus-Lorenz; A F Sikorski

doi:10.18388/abp.2001_5109

Application of genetic semihomology algorithm to theoretical studies on various protein families.

Acta Biochimica Polonica ◽

10.18388/abp.2001_5109 ◽

2001 ◽

Vol 48 (1) ◽

pp. 21-33 ◽

Cited By ~ 5

Author(s):

J Leluk ◽

B Hanus-Lorenz ◽

A F Sikorski

Keyword(s):

Consensus Sequence ◽

High Accuracy ◽

Theoretical Studies ◽

Protein Families ◽

Optimal Sequence ◽

Multiple Alignments ◽

Low Degree ◽

Correct Alignment ◽

Statistical Approaches ◽

Related Proteins

Several protein families of different nature were studied for genetic relationship, correct alignment at non-homologous fragments, optimal sequence consensus construction, and confirmation of their actual relevance. A comparison of the genetic semihomology approach with statistical approaches indicates a high accuracy and cognition significance of the former. This is particularly pronounced in the study of related proteins that show a low degree of homology. The sequence multiple alignments were verified and corrected with respect to the questionable, non-homologous fragments. The verified alignments were the basis for consensus sequence formation. The frequency of six-codon amino acids occurrence versus position variability was studied and their possible role in amino acid mutational exchange at variable positions is discussed.

Download Full-text

Evolution of polymer formation within the actin superfamily

Molecular Biology of the Cell ◽

10.1091/mbc.e15-11-0778 ◽

2017 ◽

Vol 28 (19) ◽

pp. 2461-2469 ◽

Cited By ~ 10

Author(s):

Patrick R. Stoddard ◽

Tom A. Williams ◽

Ethan Garner ◽

Buzz Baum

Keyword(s):

Convergent Evolution ◽

Atp Hydrolysis ◽

Linear Polymers ◽

Related Protein ◽

Protein Families ◽

Structure And Dynamics ◽

Polymer Formation ◽

Filament Structure ◽

Related Proteins ◽

Actin Superfamily

While many are familiar with actin as a well-conserved component of the eukaryotic cytoskeleton, it is less often appreciated that actin is a member of a large superfamily of structurally related protein families found throughout the tree of life. Actin-related proteins include chaperones, carbohydrate kinases, and other enzymes, as well as a staggeringly diverse set of proteins that use the energy from ATP hydrolysis to form dynamic, linear polymers. Despite differing widely from one another in filament structure and dynamics, these polymers play important roles in ordering cell space in bacteria, archaea, and eukaryotes. It is not known whether these polymers descended from a single ancestral polymer or arose multiple times by convergent evolution from monomeric actin-like proteins. In this work, we provide an overview of the structures, dynamics, and functions of this diverse set. Then, using a phylogenetic analysis to examine actin evolution, we show that the actin-related protein families that form polymers are more closely related to one another than they are to other nonpolymerizing members of the actin superfamily. Thus all the known actin-like polymers are likely to be the descendants of a single, ancestral, polymer-forming actin-like protein.

Download Full-text

Medical Diagnostic Tests: A Review of Test Anatomy, Phases, and Statistical Treatment of Data

Computational and Mathematical Methods in Medicine ◽

10.1155/2019/1891569 ◽

2019 ◽

Vol 2019 ◽

pp. 1-22 ◽

Cited By ~ 16

Author(s):

Sorana D. Bolboacă

Keyword(s):

Clinical Practice ◽

Diagnostic Test ◽

Diagnostic Tests ◽

Statistical Treatment ◽

High Accuracy ◽

Phase Iii ◽

Medical Diagnostic ◽

Statistical Approaches ◽

Phase I And Ii

Diagnostic tests are approaches used in clinical practice to identify with high accuracy the disease of a particular patient and thus to provide early and proper treatment. Reporting high-quality results of diagnostic tests, for both basic and advanced methods, is solely the responsibility of the authors. Despite the existence of recommendation and standards regarding the content or format of statistical aspects, the quality of what and how the statistic is reported when a diagnostic test is assessed varied from excellent to very poor. This article briefly reviews the steps in the evaluation of a diagnostic test from the anatomy, to the role in clinical practice, and to the statistical methods used to show their performances. The statistical approaches are linked with the phase, clinical question, and objective and are accompanied by examples. More details are provided for phase I and II studies while the statistical treatment of phase III and IV is just briefly presented. Several free online resources useful in the calculation of some statistics are also given.

Download Full-text

Yosshi: a web-server for disulfide engineering by bioinformatic analysis of diverse protein families

Nucleic Acids Research ◽

10.1093/nar/gkz385 ◽

2019 ◽

Vol 47 (W1) ◽

pp. W308-W314 ◽

Cited By ~ 6

Author(s):

Dmitry Suplatov ◽

Daria Timonina ◽

Yana Sharapova ◽

Vytas Švedas

Keyword(s):

Disulfide Bonds ◽

Web Server ◽

Bioinformatic Analysis ◽

Protein Families ◽

Sequence Alignments ◽

Interactive Analysis ◽

Functionally Diverse ◽

Available Information ◽

Related Proteins

Abstract Disulfide bonds play a significant role in protein stability, function or regulation but are poorly conserved among evolutionarily related proteins. The Yosshi can help to understand the role of S–S bonds by comparing sequences and structures of homologs with diverse properties and different disulfide connectivity patterns within a common structural fold of a superfamily, and assist to select the most promising hot-spots to improve stability of proteins/enzymes or modulate their functions by introducing naturally occurring crosslinks. The bioinformatic analysis is supported by the integrated Mustguseal web-server to construct large structure-guided sequence alignments of functionally diverse protein families that can include thousands of proteins based on all available information in public databases. The Yosshi+Mustguseal is a new integrated web-tool for a systematic homology-driven analysis and engineering of S–S bonds that facilitates a broader interpretation of disulfides not just as a factor of structural stability, but rather as a mechanism to implement functional diversity within a superfamily. The results can be downloaded as a content-rich PyMol session file or further studied online using the HTML5-based interactive analysis tools. Both web-servers are free and open to all users at https://biokinet.belozersky.msu.ru/yosshi and there is no login requirement.

Download Full-text

PROGRESSIVE MULTIPLE ALIGNMENT USING SEQUENCE TRIPLET OPTIMIZATIONS AND THREE-RESIDUE EXCHANGE COSTS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720004000831 ◽

2004 ◽

Vol 02 (04) ◽

pp. 719-745 ◽

Cited By ~ 8

Author(s):

ARUN SIDDHARTH KONAGURTHU ◽

JAMES WHISSTOCK ◽

PETER J. STUCKEY

Keyword(s):

Amino Acid ◽

Amino Acid Substitution ◽

Multiple Alignment ◽

Substitution Matrix ◽

Practical Approach ◽

Sequence Alignments ◽

Optimal Sequence ◽

Amino Acid Substitution Matrix ◽

Multiple Alignments ◽

First Time

In this paper we demonstrate a practical approach to construct progressive multiple alignments using sequence triplet optimizations rather than a conventional pairwise approach. Using the sequence triplet alignments progressively provides a scope for the synthesis of a three-residue exchange amino acid substitution matrix. We develop such a 20×20×20 matrix for the first time and demonstrate how its use in optimal sequence triplet alignments increases the sensitivity of building multiple alignments. Various comparisons were made between alignments generated using the progressive triplet methods and the conventional progressive pairwise procedure. The assessment of these data reveal that, in general, the triplet based approaches generate more accurate sequence alignments than the traditional pairwise based procedures, especially between more divergent sets of sequences.

Download Full-text

Molecular recognition in protein families: A database of aligned three-dimensional structures of related proteins

Biochemical Society Transactions ◽

10.1042/bst0210597 ◽

1993 ◽

Vol 21 (3) ◽

pp. 597-604 ◽

Cited By ~ 37

Author(s):

John P. Overington ◽

Zhan-Yang Zhu ◽

Andrej Šali ◽

Mark S. Johnson ◽

Ramanathan Sowdhamini ◽

...

Keyword(s):

Molecular Recognition ◽

Three Dimensional ◽

Protein Families ◽

Related Proteins

Download Full-text

Accurate contact-based modelling of repeat proteins predicts the structure of new repeats protein families

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008798 ◽

2021 ◽

Vol 17 (4) ◽

pp. e1008798

Author(s):

Claudio Bassot ◽

Arne Elofsson

Keyword(s):

Deep Learning ◽

Protein Structure ◽

High Accuracy ◽

Unique Sequence ◽

Direct Coupling ◽

Protein Families ◽

Coupling Analysis ◽

Repeat Proteins ◽

Eukaryotic Proteomes ◽

Direct Coupling Analysis

Repeat proteins are abundant in eukaryotic proteomes. They are involved in many eukaryotic specific functions, including signalling. For many of these proteins, the structure is not known, as they are difficult to crystallise. Today, using direct coupling analysis and deep learning it is often possible to predict a protein’s structure. However, the unique sequence features present in repeat proteins have been a challenge to use direct coupling analysis for predicting contacts. Here, we show that deep learning-based methods (trRosetta, DeepMetaPsicov (DMP) and PconsC4) overcomes this problem and can predict intra- and inter-unit contacts in repeat proteins. In a benchmark dataset of 815 repeat proteins, about 90% can be correctly modelled. Further, among 48 PFAM families lacking a protein structure, we produce models of forty-one families with estimated high accuracy.

Download Full-text

Frequent subgraph mining for biologically meaningful structural motifs

10.1101/2020.05.14.095695 ◽

2020 ◽

Author(s):

Sebastian Keller ◽

Pauli Miettinen ◽

Olga V. Kalinina

Keyword(s):

Ligand Binding ◽

Three Dimensional ◽

Structural Motif ◽

Frequent Subgraph Mining ◽

Protein Families ◽

Biologically Relevant ◽

Subgraph Mining ◽

Frequent Subgraph ◽

Related Proteins ◽

Jelly Roll

AbstractIdentification of biologically relevant motifs in proteins is a long-standing problem in bioinformatics, especially when considering distantly related proteins where sequence analysis alone becomes increasingly difficult. Here we present a novel approach to identify such motifs in protein three-dimensional structures without depending on sequence alignment by representing structures as graphs in the form of residue interaction networks and employing a modified frequent subgraph mining algorithm. These networks represent residues as vertices while contacts between residues are denoted by edges labeled with Euclidean distances. We use frequent subgraph mining to determine all subgraphs that are subgraph isomorphic to, i.e. are contained in, at least a given number of such networks generated from structures in the same protein family. For this we introduce two extensions of the classical frequent subgraph mining: approximate matching of distance-based labels to account for small variations between protein structures and scoring as well as score-based filtering of subgraphs in order to identify structurally conserved motifs and to counteract the expanding size of the search space. This approach was then validated by demonstrating that it can rediscover previously characterized functionally important structural motifs in selected protein families. For further validation we show that it is also able to identify motifs that correspond to patterns in the PROSITE database. We then applied our approach to all superfamilies in the SCOP database and found an enrichment of residues in the ligand binding site in the discovered motifs evidencing their functional importance. Finally we use the approach to discover a novel structural motif in jelly-roll capsid proteins found in members of the picornavirus-like superfamily. This is presented together with an efficient open source implementation of the algorithm called RINminer.Author summaryAs the evolutionary distance between proteins increases, their sequence identity drops rapidly, whereas functionally important sequence motifs and three-dimensional (3D) structural scaffold, in which they are embedded, are more conserved. We developed an approach that automatically identifies such motifs by converting protein 3D structures into a set of graphs and then employing the frequent subgraph mining framework. In these graphs, residues are represented as vertices, and if two residues interact in the corresponding protein 3D structure, they are connected by an edge labeled with the Euclidean distance between the residues. In the classical setting of frequent subgraph mining, all subgraphs from a database of graphs are enumerated and the ones that are exactly found, i.e. are subgraph isomorphic, in more than a certain number of graphs are listed as supported. Our approach introduces two new concepts: approximately isomorphic subgraphs and an efficient scoring scheme that allows to retain only biologically relevant subgraph in the enumeration step. Approximate isomorphism allows edge labels not to match exactly, and thus account for natural deviations between 3D structures of related proteins. With our approach, we were able to automatically rediscover known motifs from PROSITE, as well as in three well-studied extremely diverse protein families. We predicted functionally important residues in SCOP superfamilies and demonstrated that they tend to lie in structurally meaningful regions: ligand-binding sites and protein core. Additionally, we present a previously unreported structural motif in jelly-roll viral capsids.

Download Full-text

TIAMMAt: Leveraging biodiversity to revise protein domain models, evidence from innate immunity

Molecular Biology and Evolution ◽

10.1093/molbev/msab258 ◽

2021 ◽

Author(s):

Michael G Tassia ◽

Kyle T David ◽

James P Townsend ◽

Kenneth M Halanych

Keyword(s):

Innate Immunity ◽

Consensus Sequence ◽

Protein Domain ◽

Homologous Sequence ◽

Valuable Insight ◽

Sequence Evolution ◽

Protein Families ◽

Model Species ◽

Domain Profile ◽

Domain Models

Abstract Sequence annotation is fundamental for studying the evolution of protein families, particularly when working with non-model species. Given the rapid, ever-increasing number of species receiving high-quality genome sequencing, accurate domain modeling that is representative of species diversity is crucial for understanding protein family sequence evolution and their inferred function(s). Here, we describe a bioinformatic tool called TIAMMAt ( Taxon-Informed Adjustment of Markov Model Attributes) which revises domain profile hidden Markov models (HMMs) by incorporating homologous domain sequences from underrepresented and non-model species. Using innate immunity pathways as a case study, we show that revising profile HMM parameters to directly account for variation in homologs among underrepresented species provides valuable insight into the evolution of protein families. Following adjustment by TIAMMAt, domain profile HMMs exhibit changes in their per-site amino acid state emission probabilities and insertion/deletion probabilities while maintaining the overall structure of the consensus sequence. Our results show that domain revision can heavily impact evolutionary interpretations for some families (i.e., NLR’s NACHT domain), whereas impact on other domains (e.g., rel homology domain and interferon regulatory factor domains) is minimal due to high levels of sequence conservation across the sampled phylogenetic depth (i.e., Metazoa). Importantly, TIAMMAt revises target domain models to reflect homologous sequence variation using the taxonomic distribution under consideration by the user. TIAMMAt’s flexibility to revise any subset of the Pfam database using a user-defined taxonomic pool will make it a valuable tool for future protein evolution studies, particularly when incorporating (or focusing) on non-model species.

Download Full-text

Protein families in the metazoan genome

Development ◽

10.1242/dev.1994.supplement.27 ◽

1994 ◽

Vol 1994 (Supplement) ◽

pp. 27-33 ◽

Cited By ~ 3

Author(s):

Cyrus Chothia

Keyword(s):

Conformational Changes ◽

Evolution Of Development ◽

Gene Duplications ◽

Protein Families ◽

And Function ◽

Multicellular Organisms ◽

Related Proteins ◽

Initial Results ◽

Genome Projects ◽

Data Banks

The evolution of development involves the development of new proteins. Estimates based on the initial results of the genome projects, and on the data banks of protein sequences and structures, suggest that the large majority of proteins come from no more than one thousand families. Members of a family are descended from a common ancestor. Protein families evolve by gene duplication and mutation. Mutations change the conformation of the peripheral regions of proteins; i.e. the regions that are involved, at least in part, in their function. If mutations proceed until only 20% of the residues in related proteins are identical, it is common for the conformational changes to affect half the structure. Most of the proteins involved in the interactions of cells, and in their assembly to form multicellular organisms, are mosaic proteins. These are large and have a modular structure, in that they are built of sets of homologous domains that are drawn from a relatively small number of protein families. Patthy's model for the evolution of mosaic proteins describes how they arose through the insertion of introns into genes, gene duplications and intronic recombination. The rates of progress in the genome sequencing projects, and in protein structure analyses, means that in a few years we will have a fairly complete outline description of the molecules responsible for the structure and function of organisms at several different levels of developmental complexity. This should make a major contribution to our understanding of the evolution of development.

Download Full-text

High accuracy prediction of β-turns and their types using propensities and multiple alignments

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.20461 ◽

2005 ◽

Vol 59 (4) ◽

pp. 828-839 ◽

Cited By ~ 78

Author(s):

Patrick F.J. Fuchs ◽

Alain J.P. Alix

Keyword(s):

High Accuracy ◽

Multiple Alignments

Download Full-text