scholarly journals 3D RNA from evolutionary couplings

2015 ◽  
Author(s):  
Caleb Weinreb ◽  
Torsten Gross ◽  
Chris Sander ◽  
Debora S Marks

Non-protein-coding RNAs are ubiquitous in cell physiology, with a diverse repertoire of known functions. In fact, the majority of the eukaryotic genome does not code for proteins, and thousands of conserved long non-protein-coding RNAs of currently unkown function have been identified. When available, knowledge of their 3D structure is very helpful in elucidating the function of these RNAs. However, despite some outstanding structure elucidation of RNAs using X-ray crystallography, NMR and cryoEM, learning RNA 3D structures remains low-throughput. RNA structure prediction in silico is a promising alternative approach and works well for double-helical stems, but full 3D structure determination requires tertiary contacts outside of secondary structures that are difficult to infer from sequence information. Here, based only on information from RNA multiple sequence alignments, we use a global statistical sequence probability model of co-variation in a pairs of nucleotide positions to detect 3D contacts, in analogy to recently developed breakthrough methods for computational protein folding. In blinded tests on 22 known RNA structures ranging in size from 65 to 1800 nucleotides, the predicted contacts matched physical nucleotide interactions with 65-95% true positive prediction accuracy. Importantly, we infer many long-range tertiary contacts, including non-Watson-Crick interactions, where secondary structure elements assemble in 3D. When used as restraints in molecular dynamics simulations, the inferred contacts improve RNA 3D structure prediction to a coordinate error as low as 6 to 10 angstrom rmsd deviation in atom positions, with potential for further refinement by molecular dynamics. These contacts include functionally important interactions, such as those that distinguish the active and inactive conformations of four riboswitches. In blind prediction mode, we present evolutionary couplings suitable for folding simulations for 180 RNAs of unknown structure, available at https://marks.hms.harvard.edu/ev_rna/. We anticipate that this approach can help shed light on the structure and function of non-protein-coding RNAs as well as 3D-structured mRNAs.

2015 ◽  
Vol 112 (17) ◽  
pp. 5413-5418 ◽  
Author(s):  
Sikander Hayat ◽  
Chris Sander ◽  
Debora S. Marks ◽  
Arne Elofsson

Transmembrane β-barrels (TMBs) carry out major functions in substrate transport and protein biogenesis but experimental determination of their 3D structure is challenging. Encouraged by successful de novo 3D structure prediction of globular and α-helical membrane proteins from sequence alignments alone, we developed an approach to predict the 3D structure of TMBs. The approach combines the maximum-entropy evolutionary coupling method for predicting residue contacts (EVfold) with a machine-learning approach (boctopus2) for predicting β-strands in the barrel. In a blinded test for 19 TMB proteins of known structure that have a sufficient number of diverse homologous sequences available, this combined method (EVfold_bb) predicts hydrogen-bonded residue pairs between adjacent β-strands at an accuracy of ∼70%. This accuracy is sufficient for the generation of all-atom 3D models. In the transmembrane barrel region, the average 3D structure accuracy [template-modeling (TM) score] of top-ranked models is 0.54 (ranging from 0.36 to 0.85), with a higher (44%) number of residue pairs in correct strand–strand registration than in earlier methods (18%). Although the nonbarrel regions are predicted less accurately overall, the evolutionary couplings identify some highly constrained loop residues and, for FecA protein, the barrel including the structure of a plug domain can be accurately modeled (TM score = 0.68). Lower prediction accuracy tends to be associated with insufficient sequence information and we therefore expect increasing numbers of β-barrel families to become accessible to accurate 3D structure prediction as the number of available sequences increases.


2020 ◽  
Vol 16 (11) ◽  
pp. e1008085
Author(s):  
Grey W. Wilburn ◽  
Sean R. Eddy

Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments.


Author(s):  
Grey W. Wilburn ◽  
Sean R. Eddy

AbstractMost methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments.Author summaryComputational homology search and alignment tools are used to infer the functions and evolutionary histories of biological sequences. Most widely used tools for sequence homology searches, such as BLAST and HMMER, rely on primary sequence conservation alone. It should be possible to make more powerful search tools by also considering higher-order covariation patterns induced by 3D structure conservation. Recent advances in 3D protein structure prediction have used a class of statistical physics models called Potts models to infer pairwise correlation structure in multiple sequence alignments. However, Potts models assume alignments are given and cannot build new alignments, limiting their use in homology search. We have extended Potts models to include a probability model of insertion and deletion so they can be applied to sequence alignment and remote homology search using a new model we call a hidden Potts model (HPM). Tests of our prototype HPM software show promising results in initial benchmarking experiments, though more work will be needed to use HPMs in practical tools.


2015 ◽  
Author(s):  
Robert Sheridan ◽  
Robert J. Fieldhouse ◽  
Sikander Hayat ◽  
Yichao Sun ◽  
Yevgeniy Antipin ◽  
...  

Recently developed maximum entropy methods infer evolutionary constraints on protein function and structure from the millions of protein sequences available in genomic databases. The EVfold web server (at EVfold.org) makes these methods available to predict functional and structural interactions in proteins. The key algorithmic development has been to disentangle direct and indirect residue-residue correlations in large multiple sequence alignments and derive direct residue-residue evolutionary couplings (EVcouplings or ECs). For proteins of unknown structure, distance constraints obtained from evolutionarily couplings between residue pairs are used to de novo predict all-atom 3D structures, often to good accuracy. Given sufficient sequence information in a protein family, this is a major advance toward solving the problem of computing the native 3D fold of proteins from sequence information alone. Availability: EVfold server at http://evfold.org/ Contact: [email protected]


2020 ◽  
Author(s):  
Aashish Jain ◽  
Genki Terashi ◽  
Yuki Kagaya ◽  
Sai Raghavendra Maddhuri Venkata Subramaniya ◽  
Charles Christoffer ◽  
...  

ABSTRACTProtein 3D structure prediction has advanced significantly in recent years due to improving contact prediction accuracy. This improvement has been largely due to deep learning approaches that predict inter-residue contacts and, more recently, distances using multiple sequence alignments (MSAs). In this work we present AttentiveDist, a novel approach that uses different MSAs generated with different E-values in a single model to increase the co-evolutionary information provided to the model. To determine the importance of each MSA’s feature at the inter-residue level, we added an attention layer to the deep neural network. The model is trained in a multi-task fashion to also predict backbone and orientation angles further improving the inter-residue distance prediction. We show that AttentiveDist outperforms the top methods for contact prediction in the CASP13 structure prediction competition. To aid in structure modeling we also developed two new deep learning-based sidechain center distance and peptide-bond nitrogen-oxygen distance prediction models. Together these led to a 12% increase in TM-score from the best server method in CASP13 for structure prediction.


2021 ◽  
Author(s):  
Michael Heinzinger ◽  
Maria Littmann ◽  
Ian Sillitoe ◽  
Nicola Bordin ◽  
Christine Orengo ◽  
...  

Thanks to the recent advances in protein three-dimensional (3D) structure prediction, in particular through AlphaFold 2 and RoseTTAFold, the abundance of protein 3D information will explode over the next year(s). Expert resources based on 3D structures such as SCOP and CATH have been organizing the complex sequence-structure-function relations into a hierarchical classification schema. Experimental structures are leveraged through multiple sequence alignments, or more generally through homology-based inference (HBI) transferring annotations from a protein with experimentally known annotation to a query without annotation. Here, we presented a novel approach that expands the concept of HBI from a low-dimensional sequence-distance lookup to the level of a high-dimensional embedding-based annotation transfer (EAT). Secondly, we introduced a novel solution using single protein sequence representations from protein Language Models (pLMs), so called embeddings (Prose, ESM-1b, ProtBERT, and ProtT5), as input to contrastive learning, by which a new set of embeddings was created that optimized constraints captured by hierarchical classifications of protein 3D structures. These new embeddings (dubbed ProtTucker) clearly improved what was historically referred to as threading or fold recognition. Thereby, the new embeddings enabled the intrusion into the midnight zone of protein comparisons, i.e., the region in which the level of pairwise sequence similarity is akin of random relations and therefore is hard to navigate by HBI methods. Cautious benchmarking showed that ProtTucker reached much further than advanced sequence comparisons without the need to compute alignments allowing it to be orders of magnitude faster. Code is available at https://github.com/Rostlab/EAT .


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Aashish Jain ◽  
Genki Terashi ◽  
Yuki Kagaya ◽  
Sai Raghavendra Maddhuri Venkata Subramaniya ◽  
Charles Christoffer ◽  
...  

AbstractProtein 3D structure prediction has advanced significantly in recent years due to improving contact prediction accuracy. This improvement has been largely due to deep learning approaches that predict inter-residue contacts and, more recently, distances using multiple sequence alignments (MSAs). In this work we present AttentiveDist, a novel approach that uses different MSAs generated with different E-values in a single model to increase the co-evolutionary information provided to the model. To determine the importance of each MSA’s feature at the inter-residue level, we added an attention layer to the deep neural network. We show that combining four MSAs of different E-value cutoffs improved the model prediction performance as compared to single E-value MSA features. A further improvement was observed when an attention layer was used and even more when additional prediction tasks of bond angle predictions were added. The improvement of distance predictions were successfully transferred to achieve better protein tertiary structure modeling.


2014 ◽  
Author(s):  
Sikander Hayat ◽  
Chris Sander ◽  
Arne Elofsson ◽  
Debora S. Marks

AbstractTransmembrane β-barrels are known to play major roles in substrate transport and protein biogenesis in gram-negative bacteria, chloroplasts and mitochondria. However, the exact number of transmembrane β-barrel families is unknown and experimental structure determination is challenging. In theory, if one knows the number of strands in the β-barrel, then the 3D structure of the barrel could be trivial, but current topology predictions do not predict accurate structures and are unable to give information beyond the β-strands in the barrel. Recent work has shown successful prediction of globular and alpha-helical membrane proteins from sequence alignments, by using high ranked evolutionary couplings between residues as distance constraints to fold extended polypeptides. However, these methods, have not addressed the calculation of precise β-sheet hydrogen bonding that defines transmembrane β-barrels, and would be required to fold these proteins successfully. Hence we developed a method (EVFold_BB) that can successfully model transmembrane β-barrels by combining evolutionary couplings together with topology predictions. EVFold_BB is validated by the accurate all-atom 3D modeling of 18 proteins, representing all known membrane β-barrel families that have sufficient sequences available. To demonstrate the potential of our approach we predict the unknown 3D structure of the LptD protein, the plausibility of its accuracy is supported by the blindly predicted benchmarks, and is consistent with experimental observations. Our approach can naturally be extended to all unknown β-barrel proteins with sufficient sequence information.SignificanceEVFold_BB predicts fast, accurate 3D models of large membrane β-barrels that are notoriously hard to solve experimentally. The major advance is the use of evolutionary couplings from sequence alignments together with the β-strand prediction to ascertain accurate hydrogen bond between theβ-strands that gives rise to the canonical barrel shapes. The method will enable biological research into outer-membrane proteins.


Sign in / Sign up

Export Citation Format

Share Document