GISA: using Gauss Integrals to identify rare conformations in protein structures

PeerJ ◽

10.7717/peerj.9159 ◽

2020 ◽

Vol 8 ◽

pp. e9159 ◽

Cited By ~ 1

Author(s):

Christian Grønbæk ◽

Thomas Hamelryck ◽

Peter Røgen

Keyword(s):

Structural Alignment ◽

Protein Structures ◽

Native Structure ◽

Command Line Tool ◽

General Benefit ◽

And Function ◽

In Cis ◽

Protein Models ◽

Bioinformatics Community ◽

General Method

The native structure of a protein is important for its function, and therefore methods for exploring protein structures have attracted much research. However, rather few methods are sensitive to topologic-geometric features, the examples being knots, slipknots, lassos, links, and pokes, and with each method aimed only for a specific set of such configurations. We here propose a general method which transforms a structure into a ”fingerprint of topological-geometric values” consisting in a series of real-valued descriptors from mathematical Knot Theory. The extent to which a structure contains unusual configurations can then be judged from this fingerprint. The method is not confined to a particular pre-defined topology or geometry (like a knot or a poke), and so, unlike existing methods, it is general. To achieve this our new algorithm, GISA, as a key novelty produces the descriptors, so called Gauss integrals, not only for the full chains of a protein but for all its sub-chains. This allows fingerprinting on any scale from local to global. The Gauss integrals are known to be effective descriptors of global protein folds. Applying GISA to sets of several thousand high resolution structures, we first show how the most basic Gauss integral, the writhe, enables swift identification of pre-defined geometries such as pokes and links. We then apply GISA with no restrictions on geometry, to show how it allows identifying rare conformations by finding rare invariant values only. In this unrestricted search, pokes and links are still found, but also knotted conformations, as well as more highly entangled configurations not previously described. Thus, an application of the basic scan method in GISA’s tool-box revealed 10 known cases of knots as the top positive writhe cases, while placing at the top of the negative writhe 14 cases in cis-trans isomerases sharing a spatial motif of little secondary structure content, which possibly has gone unnoticed. Possible general applications of GISA are fold classification and structural alignment based on local Gauss integrals. Others include finding errors in protein models and identifying unusual conformations that might be important for protein folding and function. By its broad potential, we believe that GISA will be of general benefit to the structural bioinformatics community. GISA is coded in C and comes as a command line tool. Source and compiled code for GISA plus read-me and examples are publicly available at GitHub (https://github.com).

Download Full-text

GISA: Using Gauss Integrals to identify rare conformations in protein structures

10.1101/758029 ◽

2019 ◽

Cited By ~ 1

Author(s):

Christian Grønbæk ◽

Thomas Hamelryck ◽

Peter Røgen

Keyword(s):

Protein Structures ◽

Native Structure ◽

Command Line Tool ◽

Potential Applications ◽

General Benefit ◽

And Function ◽

In Cis ◽

Protein Models ◽

Bioinformatics Community ◽

General Method

AbstractThe native structure of a protein is important for its function, and therefore methods for exploring protein structures have attracted much research. However, rather few methods are sensitive to topologic-geometric features, the examples being knots, slipknots, lassos, links, and pokes, and with each method aimed only for a specific set of such configurations.We here propose a general method which transforms a structure into a “fingerprint of topological-geometric values” consisting in a series of real-valued descriptors from mathematical Knot Theory. The extent to which a structure contains unusual configurations can then be judged from this fingerprint. The method is therefore not confined to a particular pre-defined topology or geometry (like a knot or a poke), and so, unlike existing methods, it is general. To achieve this our new algorithm, GISA, as a key novelty produces the descriptors, so called Gauss integrals, not only for the full chains of a protein but for all its sub-chains, thereby allowing fingerprinting on any scale from local to global. The Gauss integrals are known to be effective descriptors of global protein folds.Applying GISA to a set of about 8000 high resolution structures (top8000), we first show how it enables swift identification of predefined geometries such as pokes and links. We then apply GISA with no restrictions on geometry, to show how it allows identifying rare conformations by finding rare invariant values only. In this unrestricted search, pokes and links are still found, but also knotted conformations, as well as more highly entangled configurations not previously described. Thus, applying the basic scan method in GISA’s tool-box to the top8000 set, 10 known cases of knots are ranked as the top positive Gauss number cases, while placing at the top of the negative Gauss numbers 14 cases in cis-trans isomerases sharing a spatial motif of little secondary structure content, which possibly has gone unnoticed.Potential applications of the GISA tools include finding errors in protein models and identifying unusual conformations that might be important for protein folding and function. By its broad potential, we believe that GISA will be of general benefit to the structural bioinformatics community.GISA is coded in C and comes as a command line tool. Source and compiled code for GISA plus read-me and examples are publicly available at GitHub (https://github.com).

Download Full-text

Learning structural motif representations for efficient protein structure search

10.1101/137828 ◽

2017 ◽

Cited By ~ 2

Author(s):

Yang Liu ◽

Qing Ye ◽

Liwei Wang ◽

Jian Peng

Keyword(s):

Protein Structure ◽

Fundamental Problem ◽

Sequence Similarity ◽

Structural Alignment ◽

Protein Structures ◽

Computational Cost ◽

Data Bank ◽

Hierarchical Organization ◽

Structural Motif ◽

And Function

AbstractMotivationUnderstanding the relationship between protein structure and function is a fundamental problem in protein science. Given a protein of unknown function, fast identification of similar protein structures from the Protein Data Bank (PDB) is a critical step for inferring its biological function. Such structural neighbors can provide evolutionary insights into protein conformation, interfaces and binding sites that are not detectable from sequence similarity. However, the computational cost of performing pairwise structural alignment against all structures in PDB is prohibitively expensive. Alignment-free approaches have been introduced to enable fast but coarse comparisons by representing each protein as a vector of structure features or fingerprints and only computing similarity between vectors. As a notable example, FragBag represents each protein by a “bag of fragments”, which is a vector of frequencies of contiguous short backbone fragments from a predetermined library.ResultsHere we present a new approach to learning effective structural motif presentations using deep learning. We develop DeepFold, a deep convolutional neural network model to extract structural motif features of a protein structure. Similar to FragBag, DeepFold represents each protein structure or fold using a vector of learned structural motif features. We demonstrate that DeepFold substantially outperforms FragBag on protein structural search on a non-redundant protein structure database and a set of newly released structures. Remarkably, DeepFold not only extracts meaningful backbone segments but also finds important long-range interacting motifs for structural comparison. We expect that DeepFold will provide new insights into the evolution and hierarchical organization of protein structural motifs.Availabilityhttps://github.com/largelymfs/[email protected]

Download Full-text

Preparation and Characterization of β-glucosidase Films for Stabilization and Handling in Dry Configurations

Current Pharmaceutical Biotechnology ◽

10.2174/1389201020666191202145351 ◽

2020 ◽

Vol 21 (8) ◽

pp. 741-747

Author(s):

Liguang Zhang ◽

Yanan Shen ◽

Wenjing Lu ◽

Lengqiu Guo ◽

Min Xiang ◽

...

Keyword(s):

Tensile Strength ◽

Protein Function ◽

Protein Stabilization ◽

Functional Protein ◽

Elongation At Break ◽

Gelatin Films ◽

The Stability ◽

And Function ◽

General Method

Background: Although the stability of proteins is of significance to maintain protein function for therapeutical applications, this remains a challenge. Herein, a general method of preserving protein stability and function was developed using gelatin films. Method: Enzymes immobilized onto films composed of gelatin and Ethylene Glycol (EG) were developed to study their ability to stabilize proteins. As a model functional protein, β-glucosidase was selected. The tensile properties, microstructure, and crystallization behavior of the gelatin films were assessed. Result: Our results indicated that film configurations can preserve the activity of β-glucosidase under rigorous conditions (75% relative humidity and 37°C for 47 days). In both control films and films containing 1.8 % β-glucosidase, tensile strength increased with increased EG content, whilst the elongation at break increased initially, then decreased over time. The presence of β-glucosidase had a negligible influence on tensile strength and elongation at break. Scanning electron-microscopy (SEM) revealed that with increasing EG content or decreasing enzyme concentrations, a denser microstructure was observed. Conclusion: In conclusion, the dry film is a promising candidate to maintain protein stabilization and handling. The configuration is convenient and cheap, and thus applicable to protein storage and transportation processes in the future.

Download Full-text

Hide and seek: interplay between influenza viruses and B cells

International Immunology ◽

10.1093/intimm/dxaa028 ◽

2020 ◽

Vol 32 (9) ◽

pp. 605-611 ◽

Cited By ~ 2

Author(s):

Masayuki Kuraoka ◽

Yu Adachi ◽

Yoshimasa Takahashi

Keyword(s):

B Cells ◽

B Cell ◽

Surface Protein ◽

Influenza Viruses ◽

Influenza Vaccines ◽

Native Structure ◽

Humoral Immune ◽

Protective Antibodies ◽

Major Surface ◽

And Function

Abstract Influenza virus constantly acquires genetic mutations/reassortment in the major surface protein, hemagglutinin (HA), resulting in the generation of strains with antigenic variations. There are, however, HA epitopes that are conserved across influenza viruses and are targeted by broadly protective antibodies. A goal for the next-generation influenza vaccines is to stimulate B-cell responses against such conserved epitopes in order to provide broad protection against divergent influenza viruses. Broadly protective B cells, however, are not easily activated by HA antigens with native structure, because the virus has multiple strategies to escape from the humoral immune responses directed to the conserved epitopes. One such strategy is to hide the conserved epitopes from the B-cell surveillance by steric hindrance. Technical advancement in the analysis of the human B-cell antigen receptor (BCR) repertoire has dissected the BCRs to HA epitopes that are hidden in the native structure but are targeted by broadly protective antibodies. We describe here the characterization and function of broadly protective antibodies and strategies that enable B cells to seek these hidden epitopes, with potential implications for the development of universal influenza vaccines.

Download Full-text

Commuting to Work: Nucleolar Long Non-Coding RNA Control Ribosome Biogenesis from Near and Far

Non-Coding RNA ◽

10.3390/ncrna7030042 ◽

2021 ◽

Vol 7 (3) ◽

pp. 42

Author(s):

Victoria Mamontova ◽

Barbara Trifault ◽

Lea Boten ◽

Kaspar Burger

Keyword(s):

Gene Expression ◽

Rna Polymerase Ii ◽

Ribosome Biogenesis ◽

Intergenic Spacer ◽

Cellular Growth ◽

Rrna Synthesis ◽

Protein Coding ◽

Non Coding Rna ◽

And Function ◽

In Cis

Gene expression is an essential process for cellular growth, proliferation, and differentiation. The transcription of protein-coding genes and non-coding loci depends on RNA polymerases. Interestingly, numerous loci encode long non-coding (lnc)RNA transcripts that are transcribed by RNA polymerase II (RNAPII) and fine-tune the RNA metabolism. The nucleolus is a prime example of how different lncRNA species concomitantly regulate gene expression by facilitating the production and processing of ribosomal (r)RNA for ribosome biogenesis. Here, we summarise the current findings on how RNAPII influences nucleolar structure and function. We describe how RNAPII-dependent lncRNA can both promote nucleolar integrity and inhibit ribosomal (r)RNA synthesis by modulating the availability of rRNA synthesis factors in trans. Surprisingly, some lncRNA transcripts can directly originate from nucleolar loci and function in cis. The nucleolar intergenic spacer (IGS), for example, encodes nucleolar transcripts that counteract spurious rRNA synthesis in unperturbed cells. In response to DNA damage, RNAPII-dependent lncRNA originates directly at broken ribosomal (r)DNA loci and is processed into small ncRNA, possibly to modulate DNA repair. Thus, lncRNA-mediated regulation of nucleolar biology occurs by several modes of action and is more direct than anticipated, pointing to an intimate crosstalk of RNA metabolic events.

Download Full-text

Unlocking the Reversal Potential of Solid Supported Membrane Electrophysiology to Determine Transport Stoichiometry

10.1101/2020.05.07.082438 ◽

2020 ◽

Cited By ~ 1

Author(s):

Nathan E. Thomas ◽

Katherine A. Henzler-Wildman

Keyword(s):

Reversal Potential ◽

New Technique ◽

Supported Membrane ◽

Functional Studies ◽

A New Technique ◽

And Function ◽

Improved Methods ◽

Insight Into ◽

General Method

AbstractTransport stoichiometry provides insight into the mechanism and function of ion-coupled transporters, but measuring transport stoichiometry is time-consuming and technically difficult. With the increasing evidence that many ion-coupled transporters employ multiple transport stoichiometries under different conditions, improved methods to determine transport stoichiometry are required to accurately characterize transporter activity. Reversal potential was previously shown to be a reliable, general method for determining the transport stoichiometry of ion-coupled transporters (Fitzgerald & Mindell, 2017). Here, we develop a new technique for measuring transport stoichiometry with greatly improved throughput using solid supported membrane electrophysiology (SSME). Using this technique, we are able to verify the recent report of a fixed 2:1 stoichiometry for the proton:guanidinium antiporter Gdx. Our SSME method requires only small amounts of transporter and provides a fast, easy, general method for measuring transport stoichiometry, which will facilitate future mechanistic and functional studies of ion-coupled transporters.

Download Full-text

AlphaFill: enriching the AlphaFold models with ligands and co-factors

10.1101/2021.11.26.470110 ◽

2021 ◽

Author(s):

Maarten L Hekkelman ◽

Ida de de Vries ◽

Robbie P Joosten ◽

Anastassis Perrakis

Keyword(s):

Small Molecules ◽

Structural Integrity ◽

Biological Function ◽

Protein Structures ◽

Molecular Function ◽

Structure Database ◽

Zinc Finger Motifs ◽

Structure Similarity ◽

Protein Models ◽

Design Experiments

Artificial intelligence (AI) methods for constructing structural models of proteins on the basis of their sequence are having a transformative effect in biomolecular sciences. The AlphaFold protein structure database makes available hundreds of thousands of protein structures. However, all these structures lack cofactors essential for their structural integrity and molecular function (e.g. hemoglobin lacks a bound heme), key ions essential for structural integrity (e.g. zinc-finger motifs) or catalysis (e.g. Ca2+ or Zn2+ in metalloproteases), and ligands that are important for biological function (e.g. kinase structures lack ADP or ATP). Here, we present AlphaFill, an algorithm based on sequence and structure similarity, to "transplant" such "missing" small molecules and ions from experimentally determined structures to predicted protein models. These publicly available structural annotations are mapped to predicted protein models, to help scientists interpret biological function and design experiments.

Download Full-text

3. Proteins

Biochemistry: A Very Short Introduction ◽

10.1093/actrade/9780198833871.003.0003 ◽

2021 ◽

pp. 34-51

Author(s):

Mark Lorch

Keyword(s):

Amino Acids ◽

Protein Folding ◽

Protein Structure ◽

Protein Structures ◽

Structure And Function ◽

Vast Array ◽

A Cell ◽

Cellular Machinery ◽

And Function ◽

The Relationship

This chapter examines proteins, the dominant proportion of cellular machinery, and the relationship between protein structure and function. The multitude of biological processes needed to keep cells functioning are managed in the organism or cell by a massive cohort of proteins, together known as the proteome. The twenty amino acids that make up the bulk of proteins produce the vast array of protein structures. However, amino acids alone do not provide quite enough chemical variety to complete all of the biochemical activity of a cell, so the chapter also explores post-translation modifications. It finishes by looking as some dynamic aspects of proteins, including enzyme kinetics and the protein folding problem.

Download Full-text

Protein designer David Baker: I like doing things that seem like magic

National Science Review ◽

10.1093/nsr/nwaa071 ◽

2020 ◽

Vol 7 (8) ◽

pp. 1410-1412

Author(s):

Weijie Zhao ◽

Chu Wang

Keyword(s):

Protein Design ◽

De Novo ◽

Protein Structures ◽

Computational Prediction ◽

Biological Functions ◽

Personal Experiences ◽

De Novo Protein Design ◽

And Function ◽

The University ◽

Opening Up

Abstract Search ‘de novo protein design’ on Google and you will find the name David Baker in all results of the first page. Professor David Baker at the University of Washington and other scientists are opening up a new world of fantastic proteins. Protein is the direct executor of most biological functions and its structure and function are fully determined by its primary sequence. Baker's group developed the Rosetta software suite that enabled the computational prediction and design of protein structures. Being able to design proteins from scratch means being able to design executors for diverse purposes and benefit society in multiple ways. Recently, NSR interviewed Prof. Baker on this fast-developing field and his personal experiences.

Download Full-text

Sequence alignment using machine learning for accurate template-based protein structure prediction

Bioinformatics ◽

10.1093/bioinformatics/btz483 ◽

2019 ◽

Vol 36 (1) ◽

pp. 104-111

Author(s):

Shuichiro Makigaki ◽

Takashi Ishida

Keyword(s):

Machine Learning ◽

Structure Prediction ◽

Tertiary Structure ◽

Structural Alignment ◽

Protein Structures ◽

Substitution Matrix ◽

Detection Methods ◽

Supplementary Information ◽

Homology Detection ◽

Sequence Alignments

Abstract Motivation Template-based modeling, the process of predicting the tertiary structure of a protein by using homologous protein structures, is useful if good templates can be found. Although modern homology detection methods can find remote homologs with high sensitivity, the accuracy of template-based models generated from homology-detection-based alignments is often lower than that from ideal alignments. Results In this study, we propose a new method that generates pairwise sequence alignments for more accurate template-based modeling. The proposed method trains a machine learning model using the structural alignment of known homologs. It is difficult to directly predict sequence alignments using machine learning. Thus, when calculating sequence alignments, instead of a fixed substitution matrix, this method dynamically predicts a substitution score from the trained model. We evaluate our method by carefully splitting the training and test datasets and comparing the predicted structure’s accuracy with that of state-of-the-art methods. Our method generates more accurate tertiary structure models than those produced from alignments obtained by other methods. Availability and implementation https://github.com/shuichiro-makigaki/exmachina. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text