scholarly journals Improving Subcellular Protein Location Classification by Incorporating Three-Dimensional Structure Information

Biomolecules ◽  
2021 ◽  
Vol 11 (11) ◽  
pp. 1607
Author(s):  
Ge Wang ◽  
Yu-Jia Zhai ◽  
Zhen-Zhen Xue ◽  
Ying-Ying Xu

The subcellular locations of proteins are closely related to their functions. In the past few decades, the application of machine learning algorithms to predict subcellular protein locations has been an important topic in proteomics. However, most studies in this field used only amino acid sequences as the data source. Only a few works focused on other protein data types. For example, three-dimensional structures, which contain far more functional protein information than sequences, remain to be explored. In this work, we extracted various handcrafted features to describe the protein structures from physical, chemical, and topological aspects, as well as the learned features obtained by deep neural networks. We then used these features to classify the subcellular protein locations. Our experimental results demonstrated that some of these structural features have a certain effect on the protein location classification, and can help improve the performance of sequence-based location predictors. Our method provides a new view for the analysis of protein spatial distribution, and is anticipated to be used in revealing the relationships between protein structures and functions.

Author(s):  
Kenneth H. Downing ◽  
Hu Meisheng ◽  
Hans-Rudolf Went ◽  
Michael A. O'Keefe

With current advances in electron microscope design, high resolution electron microscopy has become routine, and point resolutions of better than 2Å have been obtained in images of many inorganic crystals. Although this resolution is sufficient to resolve interatomic spacings, interpretation generally requires comparison of experimental images with calculations. Since the images are two-dimensional representations of projections of the full three-dimensional structure, information is invariably lost in the overlapping images of atoms at various heights. The technique of electron crystallography, in which information from several views of a crystal is combined, has been developed to obtain three-dimensional information on proteins. The resolution in images of proteins is severely limited by effects of radiation damage. In principle, atomic-resolution, 3D reconstructions should be obtainable from specimens that are resistant to damage. The most serious problem would appear to be in obtaining high-resolution images from areas that are thin enough that dynamical scattering effects can be ignored.


2020 ◽  
Vol 13 (636) ◽  
pp. eaaz5599 ◽  
Author(s):  
Kelan Chen ◽  
Richard W. Birkinshaw ◽  
Alexandra D. Gurzau ◽  
Iromi Wanigasuriya ◽  
Ruoyun Wang ◽  
...  

Structural maintenance of chromosomes flexible hinge domain containing 1 (SMCHD1) is an epigenetic regulator in which polymorphisms cause the human developmental disorder, Bosma arhinia micropthalmia syndrome, and the degenerative disease, facioscapulohumeral muscular dystrophy. SMCHD1 is considered a noncanonical SMC family member because its hinge domain is C-terminal, because it homodimerizes rather than heterodimerizes, and because SMCHD1 contains a GHKL-type, rather than an ABC-type ATPase domain at its N terminus. The hinge domain has been previously implicated in chromatin association; however, the underlying mechanism involved and the basis for SMCHD1 homodimerization are unclear. Here, we used x-ray crystallography to solve the three-dimensional structure of the Smchd1 hinge domain. Together with structure-guided mutagenesis, we defined structural features of the hinge domain that participated in homodimerization and nucleic acid binding, and we identified a functional hotspot required for chromatin localization in cells. This structure provides a template for interpreting the mechanism by which patient polymorphisms within the SMCHD1 hinge domain could compromise function and lead to facioscapulohumeral muscular dystrophy.


2021 ◽  
Vol 7 ◽  
Author(s):  
Castrense Savojardo ◽  
Matteo Manfredi ◽  
Pier Luigi Martelli ◽  
Rita Casadio

Solvent accessibility (SASA) is a key feature of proteins for determining their folding and stability. SASA is computed from protein structures with different algorithms, and from protein sequences with machine-learning based approaches trained on solved structures. Here we ask the question as to which extent solvent exposure of residues can be associated to the pathogenicity of the variation. By this, SASA of the wild-type residue acquires a role in the context of functional annotation of protein single-residue variations (SRVs). By mapping variations on a curated database of human protein structures, we found that residues targeted by disease related SRVs are less accessible to solvent than residues involved in polymorphisms. The disease association is not evenly distributed among the different residue types: SRVs targeting glycine, tryptophan, tyrosine, and cysteine are more frequently disease associated than others. For all residues, the proportion of disease related SRVs largely increases when the wild-type residue is buried and decreases when it is exposed. The extent of the increase depends on the residue type. With the aid of an in house developed predictor, based on a deep learning procedure and performing at the state-of-the-art, we are able to confirm the above tendency by analyzing a large data set of residues subjected to variations and occurring in some 12,494 human protein sequences still lacking three-dimensional structure (derived from HUMSAVAR). Our data support the notion that surface accessible area is a distinguished property of residues that undergo variation and that pathogenicity is more frequently associated to the buried property than to the exposed one.


2015 ◽  
Vol 13 ◽  
pp. 34
Author(s):  
J. K.S. NASCIMENTO et al

Teaching biochemistry in higher education is increasingly becoming a challenge. It is notoriously difficult for students to assimilate the topic; in addition there are many complaints about the complexity of subjects and a lack of integration with the day-to-day. A recurrent problem in undergraduate courses is the absence of teaching practice in specific disciplines. This work aimed to stimulate students in the biological sciences course who were enrolled in the discipline of MOLECULAR DIVERSITY (MD), to create hypothetical classes focused on basic education highlighting the proteins topic. The methodology was applied in a class that contained 35 students. Seven groups were formed, and each group chose a protein to be used as a source of study for elementary school classes. A lesson plan was created focusing on the methodology that the group would use to manage a class. The class was to be presented orally. Students were induced to be creative and incorporate a teacher figure, and to propose teaching methodologies for research using the CTS approach (Science, Technology and Society). Each group presented a three-dimensional structure of the protein they had chosen, explained their structural features and functions and how they would develop the theme for a class of basic education, and what kind of methodology they would use for this purpose. At the end of the presentations, a questionnaire was given to students in order to evaluate the effectiveness of the methodology in the teaching-learning process. The activity improved the teacher’s training and developed skills and abilities, such as creativity, didactical planning, teaching ability, development of educational models and the use of new technologies. The methodology used in this work was extremely important to the training of future teachers, who were able to better understand the content covered in the discipline and relate it to day-to-day life.


2019 ◽  
Author(s):  
Kai Shimagaki ◽  
Martin Weigt

Statistical models for families of evolutionary related proteins have recently gained interest: in particular pairwise Potts models, as those inferred by the Direct-Coupling Analysis, have been able to extract information about the three-dimensional structure of folded proteins, and about the effect of amino-acid substitutions in proteins. These models are typically requested to reproduce the one- and two-point statistics of the amino-acid usage in a protein family, i.e. to capture the so-called residue conservation and covariation statistics of proteins of common evolutionary origin. Pairwise Potts models are the maximum-entropy models achieving this. While being successful, these models depend on huge numbers of ad hoc introduced parameters, which have to be estimated from finite amount of data and whose biophysical interpretation remains unclear. Here we propose an approach to parameter reduction, which is based on selecting collective sequence motifs. It naturally leads to the formulation of statistical sequence models in terms of Hopfield-Potts models. These models can be accurately inferred using a mapping to restricted Boltzmann machines and persistent contrastive divergence. We show that, when applied to protein data, even 20-40 patterns are sufficient to obtain statistically close-to-generative models. The Hopfield patterns form interpretable sequence motifs and may be used to clusterize amino-acid sequences into functional sub-families. However, the distributed collective nature of these motifs intrinsically limits the ability of Hopfield-Potts models in predicting contact maps, showing the necessity of developing models going beyond the Hopfield-Potts models discussed here.


2018 ◽  
Author(s):  
David J Winter ◽  
Austen RD Ganley ◽  
Carolyn A Young ◽  
Ivan Liachko ◽  
Christopher L Schardl ◽  
...  

AbstractStructural features of genomes, including the three-dimensional arrangement of DNA in the nucleus, are increasingly seen as key contributors to the regulation of gene expression. However, studies on how genome structure and nuclear organization influence transcription have so far been limited to a handful of model species. This narrow focus limits our ability to draw general conclusions about the ways in which three-dimensional structures are encoded, and to integrate information from three-dimensional data to address a broader gamut of biological questions. Here, we generate a complete and gapless genome sequence for the filamentous fungus,Epichloë festucae. Coupling it with RNAseq and HiC data, we investigate how the structure of the genome contributes to the suite of transcriptional changes that anEpichloëspecies needs to maintain symbiotic relationships with its grass host. Our results reveal a unique “patchwork” genome, in which repeat-rich blocks of DNA with discrete boundaries are interspersed by gene-rich sequences. In contrast to other species, the three-dimensional structure of the genome is anchored by these repeat blocks, which act to isolate transcription in neighbouring gene-rich regions. Genes that are differentially expressed in planta are enriched near the boundaries of these repeat-rich blocks, suggesting that their three-dimensional orientation partly encodes and regulates the symbiotic relationship formed by this organism.


2015 ◽  
Vol 5 (1) ◽  
Author(s):  
A.E. Naas ◽  
A.K. MacKenzie ◽  
B. Dalhus ◽  
V.G.H. Eijsink ◽  
P.B. Pope

Abstract Previous gene-centric analysis of a cow rumen metagenome revealed the first potentially cellulolytic polysaccharide utilization locus, of which the main catalytic enzyme (AC2aCel5A) was identified as a glycoside hydrolase (GH) family 5 endo-cellulase. Here we present the 1.8 Å three-dimensional structure of AC2aCel5A and characterization of its enzymatic activities. The enzyme possesses the archetypical (β/α)8-barrel found throughout the GH5 family and contains the two strictly conserved catalytic glutamates located at the C-terminal ends of β-strands 4 and 7. The enzyme is active on insoluble cellulose and acts exclusively on linear β-(1,4)-linked glucans. Co-crystallization of a catalytically inactive mutant with substrate yielded a 2.4 Å structure showing cellotriose bound in the −3 to −1 subsites. Additional electron density was observed between Trp178 and Trp254, two residues that form a hydrophobic “clamp”, potentially interacting with sugars at the +1 and +2 subsites. The enzyme’s active-site cleft was narrower compared to the closest structural relatives, which in contrast to AC2aCel5A, are also active on xylans, mannans and/or xyloglucans. Interestingly, the structure and function of this enzyme seem adapted to less-substituted substrates such as cellulose, presumably due to the insufficient space to accommodate the side-chains of branched glucans in the active-site cleft.


2012 ◽  
Vol 40 (5) ◽  
pp. 955-962 ◽  
Author(s):  
Nathalie Sibille ◽  
Pau Bernadó

In recent years, IDPs (intrinsically disordered proteins) have emerged as pivotal actors in biology. Despite IDPs being present in all kingdoms of life, they are more abundant in eukaryotes where they are involved in the vast majority of regulation and signalling processes. The realization that, in some cases, functional states of proteins were partly or fully disordered was in contradiction to the traditional view where a well defined three-dimensional structure was required for activity. Several experimental evidences indicate, however, that structural features in IDPs such as transient secondary-structural elements and overall dimensions are crucial to their function. NMR has been the main tool to study IDP structure by probing conformational preferences at residue level. Additionally, SAXS (small-angle X-ray scattering) has the capacity to report on the three-dimensional space sampled by disordered states and therefore complements the local information provided by NMR. The present review describes how the synergy between NMR and SAXS can be exploited to obtain more detailed structural and dynamic models of IDPs in solution. These combined strategies, embedded into computational approaches, promise the elucidation of the structure–function properties of this important, but elusive, family of biomolecules.


Author(s):  
Arun G. Ingale

To predict the structure of protein from a primary amino acid sequence is computationally difficult. An investigation of the methods and algorithms used to predict protein structure and a thorough knowledge of the function and structure of proteins are critical for the advancement of biology and the life sciences as well as the development of better drugs, higher-yield crops, and even synthetic bio-fuels. To that end, this chapter sheds light on the methods used for protein structure prediction. This chapter covers the applications of modeled protein structures and unravels the relationship between pure sequence information and three-dimensional structure, which continues to be one of the greatest challenges in molecular biology. With this resource, it presents an all-encompassing examination of the problems, methods, tools, servers, databases, and applications of protein structure prediction, giving unique insight into the future applications of the modeled protein structures. In this chapter, current protein structure prediction methods are reviewed for a milieu on structure prediction, the prediction of structural fundamentals, tertiary structure prediction, and functional imminent. The basic ideas and advances of these directions are discussed in detail.


Sign in / Sign up

Export Citation Format

Share Document