Coevolutionary Analysis of Protein Subfamilies by Sequence Reweighting

Malinverni;  Barducci

doi:10.3390/e21111127

Coevolutionary Analysis of Protein Subfamilies by Sequence Reweighting

Entropy ◽

10.3390/e21111127 ◽

2019 ◽

Vol 21 (11) ◽

pp. 1127 ◽

Cited By ~ 1

Author(s):

Malinverni ◽

Barducci

Keyword(s):

Sequence Variation ◽

Structural Information ◽

Sequence Data ◽

Response Regulator ◽

A Priori ◽

Structural Features ◽

Sequence Alignments ◽

Continuous Sequence ◽

The Family ◽

Variation Data

Extracting structural information from sequence co-variation has become a common computational biology practice in the recent years, mainly due to the availability of large sequence alignments of protein families. However, identifying features that are specific to sub-classes and not shared by all members of the family using sequence-based approaches has remained an elusive problem. We here present a coevolutionary-based method to differentially analyze subfamily specific structural features by a continuous sequence reweighting (SR) approach. We introduce the underlying principles and test its predictive capabilities on the Response Regulator family, whose subfamilies have been previously shown to display distinct, specific homo-dimerization patterns. Our results show that this reweighting scheme is effective in assigning structural features known a priori to subfamilies, even when sequence data is relatively scarce. Furthermore, sequence reweighting allows assessing if individual structural contacts pertain to specific subfamilies and it thus paves the way for the identification specificity-determining contacts from sequence variation data.

Download Full-text

A phylogeny of members of the family Taeniidae based on the mitochondrial cox1 and nad1 gene data

Parasitology ◽

10.1017/s003118200800499x ◽

2008 ◽

Vol 135 (12) ◽

pp. 1457-1467 ◽

Cited By ~ 70

Author(s):

A. LAVIKAINEN ◽

V. HAUKISALMI ◽

M. J. LEHTINEN ◽

H. HENTTONEN ◽

A. OKSANEN ◽

...

Keyword(s):

Dna Sequences ◽

Nadh Dehydrogenase ◽

Sequence Variation ◽

Sequence Data ◽

Distinct Species ◽

Epidemiological Studies ◽

Sister Taxon ◽

The Family ◽

Nad1 Gene ◽

Gene Data

SUMMARYThe cestode family Taeniidae consists of 2 genera, Taenia and Echinococcus, which both have been the focus of intensive taxonomic and epidemiological studies because of their zoonotic importance. However, a comprehensive molecular phylogeny of this family has yet to be reconstructed. In this study, 54 isolates representing 9 Taenia species were characterized using DNA sequences in the mitochondrial cytochrome c oxidase subunit 1 (cox1) and NADH dehydrogenase subunit 1 (nad1) genes. Phylogenetic relationships within the family Taeniidae were inferred by combining cox1 and nad1 sequence data of the present and previous studies. In the phylogenetic analysis, the genus Echinococcus was shown to be monophyletic, but Taenia proved to be paraphyletic due to the position of T. mustelae as a probable sister taxon of Echinococcus. This indicates that T. mustelae should form a genus of its own. Taenia ovis krabbei was placed distant from T. ovis ovis, as a sister taxon of T. multiceps, supporting its recognition as a distinct species, T. krabbei. High intraspecific sequence variation within both T. polyacantha and T. taeniaeformis suggests the existence of cryptic sister species.

Download Full-text

Benchmarking inverse statistical approaches for protein structure and design with exactly solvable models

10.1101/028936 ◽

2015 ◽

Cited By ~ 2

Author(s):

Hugo Jacquin ◽

Amy Gilson ◽

Eugene Shakhnovich ◽

Simona Cocco ◽

Rémi Monasson

Keyword(s):

Protein Structure ◽

Structural Information ◽

Sequence Data ◽

Careful Analysis ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Pairwise Models ◽

Statistical Approaches ◽

And Function

Inverse statistical approaches to determine protein structure and function from Multiple Sequence Alignments (MSA) are emerging as powerful tools in computational biology. However the underlying assumptions of the relationship between the inferred effective Potts Hamiltonian and real protein structure and energetics remain untested so far. Here we use lattice protein model (LP) to benchmark those inverse statistical approaches. We build MSA of highly stable sequences in target LP structures, and infer the effective pairwise Potts Hamiltonians from those MSA. We find that inferred Potts Hamiltonians reproduce many important aspects of `true' LP structures and energetics. Careful analysis reveals that effective pairwise couplings in inferred Potts Hamiltonians depend not only on the energetics of the native structure but also on competing folds; in particular, the coupling values reflect both positive design (stabilization of native conformation) and negative design (destabilization of competing folds). In addition to providing detailed structural information, the inferred Potts models used as protein Hamiltonian for design of new sequences are able to generate with high probability completely new sequences with the desired folds, which is not possible using independent-site models. Those are remarkable results as the effective LP Hamiltonians used to generate MSA are not simple pairwise models due to the competition between the folds. Our findings elucidate the reasons of the power of inverse approaches to the modelling of proteins from sequence data, and their limitations; we show, in particular, that their success crucially depend on the accurate inference of the Potts pairwise couplings.

Download Full-text

Limits and potential of combined folding and docking using PconsDock.

10.1101/2021.06.04.446442 ◽

2021 ◽

Author(s):

Gabriele Pozzati ◽

Wensi Zhu ◽

John Lamb ◽

Claudio Bassot ◽

Petras Kundrotas ◽

...

Keyword(s):

Structure Prediction ◽

De Novo ◽

Structural Information ◽

A Priori ◽

Scoring Function ◽

Protein Docking ◽

Alternative Methods ◽

Sequence Alignments ◽

Multiple Sequence ◽

Contact Distance

In the last decade, de novo protein structure prediction accuracy for individual proteins has improved significantly by utilizing deep learning (DL) methods for harvesting the co-evolution information from large multiple sequence alignments (MSA). In CASP14, the best method could predict the structure of most proteins with impressive accuracy. The same approach can, in principle, also be used to extract information about evolutionary-based contacts across protein-protein interfaces. However, most of the earlier studies have not used the latest DL methods for inter-chain contact distance predictions. In this paper, we showed for the first time that using one of the best DL-based residue-residue contact prediction methods (trRosetta), it is possible to simultaneously predict both the tertiary and quaternary structures of some protein pairs, even when the structures of the monomers are not known. Straightforward application of this method to a standard dataset for protein-protein docking yielded limited success, however, using alternative methods for MSA generating allowed us to dock accurately significantly more proteins. We also introduced a novel scoring function, PconsDock, that accurately separates 98% of correctly and incorrectly folded and docked proteins and thus this function can be used to evaluate the quality of the resulting docking models. The average performance of the method is comparable to the use of traditional, template-based or ab initio shape-complementarity-only docking methods, however, no a priori structural information for the individual proteins is needed. Moreover, the results of traditional and fold-and-dock approaches are complementary and thus a combined docking pipeline should increase overall docking success significantly. The dock-and-fold pipeline helped us to generate the best model for one of the CASP14 oligomeric targets, H1065.

Download Full-text

Measurement and Reduction of Radiation Damage in Frozen Hydrated Crystalline Specimens

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100069867 ◽

1978 ◽

Vol 36 (3) ◽

pp. 70-77 ◽

Cited By ~ 1

Author(s):

R.M. Glaeser ◽

S.B. Hayward

Keyword(s):

Diffraction Pattern ◽

Structural Information ◽

Structural Disorder ◽

Structural Features ◽

Biological Macromolecules ◽

Diffraction Intensity ◽

Object Structure ◽

Specimen Material ◽

Unit Cells ◽

Identical Unit

Highly ordered or crystalline biological macromolecules become severely damaged and structurally disordered after a brief electron exposure. Evidence that damage and structural disorder are occurring is clearly given by the fading and eventual disappearance of the specimen's electron diffraction pattern. The fading and disappearance of sharp diffraction spots implies a corresponding disappearance of periodic structural features in the specimen. By the same token, there is a oneto- one correspondence between the disappearance of the crystalline diffraction pattern and the disappearance of reproducible structural information that can be observed in the images of identical unit cells of the object structure. The electron exposures that result in a significant decrease in the diffraction intensity will depend somewhat upon the resolution (Bragg spacing) involved, and can vary considerably with the chemical makeup and composition of the specimen material.

Download Full-text

HAZARD: An Expert System for Risk Assessment of Environmental Chemicals

Methods of Information in Medicine ◽

10.1055/s-0038-1635482 ◽

1987 ◽

Vol 26 (01) ◽

pp. 13-23 ◽

Cited By ~ 2

Author(s):

H. W. Gottinger

Keyword(s):

Expert System ◽

Structural Information ◽

Chemical Carcinogenesis ◽

Structural Features ◽

Environmental Chemicals ◽

Chemical Carcinogens ◽

Carcinogenic Activity ◽

Current State ◽

Structure Activity ◽

Carcinogenic Potential

AbstractThe purpose of this paper is to report on an expert system in design that screens for potential hazards from environmental chemicals on the basis of structure-activity relationships in the study of chemical carcinogenesis, particularly with respect to analyzing the current state of known structural information about chemical carcinogens and predicting the possible carcinogenicity of untested chemicals. The structure-activity tree serves as an index of known chemical structure features associated with carcinogenic activity. The basic units of the tree are the principal recognized classes of chemical carcinogens that are subdivided into subclasses known as nodes according to specific structural features that may reflect differences in carcinogenic potential among chemicals in the class. An analysis of a computerized data base of known carcinogens (knowledge base) is proposed using the structure-activity tree in order to test the validity of the tree as a classification scheme (inference engine).

Download Full-text

Rapid Online Buffer Exchange: A Method for Screening of Proteins, Protein Complexes, and Cell Lysates by Native Mass Spectrometry

10.26434/chemrxiv.8792177 ◽

2019 ◽

Author(s):

Zachary VanAernum ◽

Florian Busch ◽

Benjamin J. Jones ◽

Mengxuan Jia ◽

Zibo Chen ◽

...

Keyword(s):

Mass Spectrometry ◽

High Speed ◽

Structural Information ◽

Protein Complexes ◽

High Sensitivity ◽

Native Mass Spectrometry ◽

Structural Features ◽

Consumer Products ◽

Cell Lysates ◽

Protein Expression And Purification

It is important to assess the identity and purity of proteins and protein complexes during and after protein purification to ensure that samples are of sufficient quality for further biochemical and structural characterization, as well as for use in consumer products, chemical processes, and therapeutics. Native mass spectrometry (nMS) has become an important tool in protein analysis due to its ability to retain non-covalent interactions during measurements, making it possible to obtain protein structural information with high sensitivity and at high speed. Interferences from the presence of non-volatiles are typically alleviated by offline buffer exchange, which is timeconsuming and difficult to automate. We provide a protocol for rapid online buffer exchange (OBE) nMS to directly screen structural features of pre-purified proteins, protein complexes, or clarified cell lysates. Information obtained by OBE nMS can be used for fast (<5 min) quality control and can further guide protein expression and purification optimization.

Download Full-text

Protein Interaction Domains and Post-Translational Modifications: Structural Features and Drug Discovery Applications

Current Medicinal Chemistry ◽

10.2174/0929867326666190620101637 ◽

2020 ◽

Vol 27 (37) ◽

pp. 6306-6355 ◽

Cited By ~ 2

Author(s):

Marian Vincenzi ◽

Flavia Anna Mercurio ◽

Marilisa Leone

Keyword(s):

Drug Discovery ◽

Protein Interaction ◽

Protein Interactions ◽

Structural Information ◽

Protein Complexes ◽

Structural Features ◽

Protein Protein Interactions ◽

Modular Architecture ◽

Post Translational Modifications ◽

Interaction Domains

Background:: Many pathways regarding healthy cells and/or linked to diseases onset and progression depend on large assemblies including multi-protein complexes. Protein-protein interactions may occur through a vast array of modules known as protein interaction domains (PIDs). Objective:: This review concerns with PIDs recognizing post-translationally modified peptide sequences and intends to provide the scientific community with state of art knowledge on their 3D structures, binding topologies and potential applications in the drug discovery field. Method:: Several databases, such as the Pfam (Protein family), the SMART (Simple Modular Architecture Research Tool) and the PDB (Protein Data Bank), were searched to look for different domain families and gain structural information on protein complexes in which particular PIDs are involved. Recent literature on PIDs and related drug discovery campaigns was retrieved through Pubmed and analyzed. Results and Conclusion:: PIDs are rather versatile as concerning their binding preferences. Many of them recognize specifically only determined amino acid stretches with post-translational modifications, a few others are able to interact with several post-translationally modified sequences or with unmodified ones. Many PIDs can be linked to different diseases including cancer. The tremendous amount of available structural data led to the structure-based design of several molecules targeting protein-protein interactions mediated by PIDs, including peptides, peptidomimetics and small compounds. More studies are needed to fully role out, among different families, PIDs that can be considered reliable therapeutic targets, however, attacking PIDs rather than catalytic domains of a particular protein may represent a route to obtain selective inhibitors.

Download Full-text

A Systematic Review on Popularity, Application and Characteristics of Protein Secondary Structure Prediction Tools

Current Drug Discovery Technologies ◽

10.2174/1570163815666180227162157 ◽

2019 ◽

Vol 16 (2) ◽

pp. 159-172 ◽

Cited By ~ 3

Author(s):

Elaheh Kashani-Amin ◽

Ozra Tabatabaei-Malazy ◽

Amirhossein Sakhteman ◽

Bagher Larijani ◽

Azadeh Ebrahim-Habibi

Keyword(s):

Systematic Review ◽

Secondary Structure ◽

Structure Prediction ◽

Web Of Science ◽

Secondary Structure Prediction ◽

Structural Information ◽

Protein Secondary Structure ◽

Structural Features ◽

Prediction Tools ◽

Insight Into

Background: Prediction of proteins’ secondary structure is one of the major steps in the generation of homology models. These models provide structural information which is used to design suitable ligands for potential medicinal targets. However, selecting a proper tool between multiple Secondary Structure Prediction (SSP) options is challenging. The current study is an insight into currently favored methods and tools, within various contexts. Objective: A systematic review was performed for a comprehensive access to recent (2013-2016) studies which used or recommended protein SSP tools. Methods: Three databases, Web of Science, PubMed and Scopus were systematically searched and 99 out of the 209 studies were finally found eligible to extract data. Results: Four categories of applications for 59 retrieved SSP tools were: (I) prediction of structural features of a given sequence, (II) evaluation of a method, (III) providing input for a new SSP method and (IV) integrating an SSP tool as a component for a program. PSIPRED was found to be the most popular tool in all four categories. JPred and tools utilizing PHD (Profile network from HeiDelberg) method occupied second and third places of popularity in categories I and II. JPred was only found in the two first categories, while PHD was present in three fields. Conclusion: This study provides a comprehensive insight into the recent usage of SSP tools which could be helpful for selecting a proper tool.

Download Full-text

Further steps on the reconstruction of convex polyominoes from orthogonal projections

Journal of Combinatorial Optimization ◽

10.1007/s10878-021-00751-z ◽

2021 ◽

Author(s):

Paolo Dulio ◽

Andrea Frosini ◽

Simone Rinaldi ◽

Lama Tarsissi ◽

Laurent Vuillon

Keyword(s):

Discrete Geometry ◽

Convex Subset ◽

Convex Sets ◽

A Priori ◽

Orthogonal Projections ◽

Reconstruction Process ◽

Combinatorial Properties ◽

The Family ◽

Discrete Counterpart ◽

Interior Part

AbstractA remarkable family of discrete sets which has recently attracted the attention of the discrete geometry community is the family of convex polyominoes, that are the discrete counterpart of Euclidean convex sets, and combine the constraints of convexity and connectedness. In this paper we study the problem of their reconstruction from orthogonal projections, relying on the approach defined by Barcucci et al. (Theor Comput Sci 155(2):321–347, 1996). In particular, during the reconstruction process it may be necessary to expand a convex subset of the interior part of the polyomino, say the polyomino kernel, by adding points at specific positions of its contour, without losing its convexity. To reach this goal we consider convexity in terms of certain combinatorial properties of the boundary word encoding the polyomino. So, we first show some conditions that allow us to extend the kernel maintaining the convexity. Then, we provide examples where the addition of one or two points causes a loss of convexity, which can be restored by adding other points, whose number and positions cannot be determined a priori.

Download Full-text

Prediction and analysis of multiple protein lysine modified sites based on conditional wasserstein generative adversarial networks

BMC Bioinformatics ◽

10.1186/s12859-021-04101-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yingxi Yang ◽

Hui Wang ◽

Wen Li ◽

Xiaobo Wang ◽

Shizhao Wei ◽

...

Keyword(s):

Correlation Coefficient ◽

Sequence Data ◽

Rapid Development ◽

Pearson Correlation ◽

Structural Features ◽

Generative Adversarial Networks ◽

Post Translational Modification ◽

Generative Adversarial Network ◽

Data Imbalance ◽

Adversarial Network

Abstract Background Protein post-translational modification (PTM) is a key issue to investigate the mechanism of protein’s function. With the rapid development of proteomics technology, a large amount of protein sequence data has been generated, which highlights the importance of the in-depth study and analysis of PTMs in proteins. Method We proposed a new multi-classification machine learning pipeline MultiLyGAN to identity seven types of lysine modified sites. Using eight different sequential and five structural construction methods, 1497 valid features were remained after the filtering by Pearson correlation coefficient. To solve the data imbalance problem, Conditional Generative Adversarial Network (CGAN) and Conditional Wasserstein Generative Adversarial Network (CWGAN), two influential deep generative methods were leveraged and compared to generate new samples for the types with fewer samples. Finally, random forest algorithm was utilized to predict seven categories. Results In the tenfold cross-validation, accuracy (Acc) and Matthews correlation coefficient (MCC) were 0.8589 and 0.8376, respectively. In the independent test, Acc and MCC were 0.8549 and 0.8330, respectively. The results indicated that CWGAN better solved the existing data imbalance and stabilized the training error. Alternatively, an accumulated feature importance analysis reported that CKSAAP, PWM and structural features were the three most important feature-encoding schemes. MultiLyGAN can be found at https://github.com/Lab-Xu/MultiLyGAN. Conclusions The CWGAN greatly improved the predictive performance in all experiments. Features derived from CKSAAP, PWM and structure schemes are the most informative and had the greatest contribution to the prediction of PTM.

Download Full-text