3D RNA from evolutionary couplings

Mapping Intimacies ◽

10.1101/028456 ◽

2015 ◽

Author(s):

Caleb Weinreb ◽

Torsten Gross ◽

Chris Sander ◽

Debora S Marks

Keyword(s):

Molecular Dynamics ◽

Structure Prediction ◽

Probability Model ◽

3D Structure ◽

Cell Physiology ◽

Sequence Information ◽

Sequence Alignments ◽

Protein Coding ◽

Promising Alternative ◽

Tertiary Contacts

Non-protein-coding RNAs are ubiquitous in cell physiology, with a diverse repertoire of known functions. In fact, the majority of the eukaryotic genome does not code for proteins, and thousands of conserved long non-protein-coding RNAs of currently unkown function have been identified. When available, knowledge of their 3D structure is very helpful in elucidating the function of these RNAs. However, despite some outstanding structure elucidation of RNAs using X-ray crystallography, NMR and cryoEM, learning RNA 3D structures remains low-throughput. RNA structure prediction in silico is a promising alternative approach and works well for double-helical stems, but full 3D structure determination requires tertiary contacts outside of secondary structures that are difficult to infer from sequence information. Here, based only on information from RNA multiple sequence alignments, we use a global statistical sequence probability model of co-variation in a pairs of nucleotide positions to detect 3D contacts, in analogy to recently developed breakthrough methods for computational protein folding. In blinded tests on 22 known RNA structures ranging in size from 65 to 1800 nucleotides, the predicted contacts matched physical nucleotide interactions with 65-95% true positive prediction accuracy. Importantly, we infer many long-range tertiary contacts, including non-Watson-Crick interactions, where secondary structure elements assemble in 3D. When used as restraints in molecular dynamics simulations, the inferred contacts improve RNA 3D structure prediction to a coordinate error as low as 6 to 10 angstrom rmsd deviation in atom positions, with potential for further refinement by molecular dynamics. These contacts include functionally important interactions, such as those that distinguish the active and inactive conformations of four riboswitches. In blind prediction mode, we present evolutionary couplings suitable for folding simulations for 180 RNAs of unknown structure, available at https://marks.hms.harvard.edu/ev_rna/. We anticipate that this approach can help shed light on the structure and function of non-protein-coding RNAs as well as 3D-structured mRNAs.

Download Full-text

All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1419956112 ◽

2015 ◽

Vol 112 (17) ◽

pp. 5413-5418 ◽

Cited By ~ 41

Author(s):

Sikander Hayat ◽

Chris Sander ◽

Debora S. Marks ◽

Arne Elofsson

Keyword(s):

Structure Prediction ◽

De Novo ◽

3D Structure ◽

3D Models ◽

Sequence Information ◽

Sequence Alignments ◽

Residue Contacts ◽

Machine Learning Approach ◽

3D Structure Prediction ◽

Structure Accuracy

Transmembrane β-barrels (TMBs) carry out major functions in substrate transport and protein biogenesis but experimental determination of their 3D structure is challenging. Encouraged by successful de novo 3D structure prediction of globular and α-helical membrane proteins from sequence alignments alone, we developed an approach to predict the 3D structure of TMBs. The approach combines the maximum-entropy evolutionary coupling method for predicting residue contacts (EVfold) with a machine-learning approach (boctopus2) for predicting β-strands in the barrel. In a blinded test for 19 TMB proteins of known structure that have a sufficient number of diverse homologous sequences available, this combined method (EVfold_bb) predicts hydrogen-bonded residue pairs between adjacent β-strands at an accuracy of ∼70%. This accuracy is sufficient for the generation of all-atom 3D models. In the transmembrane barrel region, the average 3D structure accuracy [template-modeling (TM) score] of top-ranked models is 0.54 (ranging from 0.36 to 0.85), with a higher (44%) number of residue pairs in correct strand–strand registration than in earlier methods (18%). Although the nonbarrel regions are predicted less accurately overall, the evolutionary couplings identify some highly constrained loop residues and, for FecA protein, the barrel including the structure of a plug domain can be accurately modeled (TM score = 0.68). Lower prediction accuracy tends to be associated with insufficient sequence information and we therefore expect increasing numbers of β-barrel families to become accessible to accurate 3D structure prediction as the number of available sequences increases.

Download Full-text

Remote homology search with hidden Potts models

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008085 ◽

2020 ◽

Vol 16 (11) ◽

pp. e1008085

Author(s):

Grey W. Wilburn ◽

Sean R. Eddy

Keyword(s):

Structure Prediction ◽

Probability Model ◽

3D Structure ◽

Approximate Algorithm ◽

Homology Search ◽

Biological Sequence ◽

Sequence Alignments ◽

Multiple Sequence ◽

Potts Models ◽

Alignment Algorithms

Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments.

Download Full-text

Remote homology search with hidden Potts models

10.1101/2020.06.23.168153 ◽

2020 ◽

Cited By ~ 3

Author(s):

Grey W. Wilburn ◽

Sean R. Eddy

Keyword(s):

Structure Prediction ◽

Statistical Physics ◽

Probability Model ◽

3D Structure ◽

Homology Search ◽

Sequence Alignments ◽

Multiple Sequence ◽

Potts Models ◽

Multiple Sequence Alignments ◽

Insertion And Deletion

AbstractMost methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments.Author summaryComputational homology search and alignment tools are used to infer the functions and evolutionary histories of biological sequences. Most widely used tools for sequence homology searches, such as BLAST and HMMER, rely on primary sequence conservation alone. It should be possible to make more powerful search tools by also considering higher-order covariation patterns induced by 3D structure conservation. Recent advances in 3D protein structure prediction have used a class of statistical physics models called Potts models to infer pairwise correlation structure in multiple sequence alignments. However, Potts models assume alignments are given and cannot build new alignments, limiting their use in homology search. We have extended Potts models to include a probability model of insertion and deletion so they can be applied to sequence alignment and remote homology search using a new model we call a hidden Potts model (HPM). Tests of our prototype HPM software show promising results in initial benchmarking experiments, though more work will be needed to use HPMs in practical tools.

Download Full-text

EVfold.org: Evolutionary Couplings and Protein 3D Structure Prediction

10.1101/021022 ◽

2015 ◽

Cited By ~ 14

Author(s):

Robert Sheridan ◽

Robert J. Fieldhouse ◽

Sikander Hayat ◽

Yichao Sun ◽

Yevgeniy Antipin ◽

...

Keyword(s):

Protein Function ◽

Structure Prediction ◽

De Novo ◽

3D Structure ◽

Sequence Information ◽

Major Advance ◽

Sequence Alignments ◽

Multiple Sequence ◽

Genomic Databases ◽

Multiple Sequence Alignments

Recently developed maximum entropy methods infer evolutionary constraints on protein function and structure from the millions of protein sequences available in genomic databases. The EVfold web server (at EVfold.org) makes these methods available to predict functional and structural interactions in proteins. The key algorithmic development has been to disentangle direct and indirect residue-residue correlations in large multiple sequence alignments and derive direct residue-residue evolutionary couplings (EVcouplings or ECs). For proteins of unknown structure, distance constraints obtained from evolutionarily couplings between residue pairs are used to de novo predict all-atom 3D structures, often to good accuracy. Given sufficient sequence information in a protein family, this is a major advance toward solving the problem of computing the native 3D fold of proteins from sequence information alone. Availability: EVfold server at http://evfold.org/ Contact: [email protected]

Download Full-text

AttentiveDist: Protein Inter-Residue Distance Prediction Using Deep Learning with Attention on Quadruple Multiple Sequence Alignments

10.1101/2020.11.24.396770 ◽

2020 ◽

Author(s):

Aashish Jain ◽

Genki Terashi ◽

Yuki Kagaya ◽

Sai Raghavendra Maddhuri Venkata Subramaniya ◽

Charles Christoffer ◽

...

Keyword(s):

Deep Learning ◽

Structure Prediction ◽

Prediction Models ◽

3D Structure ◽

Evolutionary Information ◽

Sequence Alignments ◽

Multiple Sequence ◽

Contact Prediction ◽

Multiple Sequence Alignments ◽

Distance Prediction

ABSTRACTProtein 3D structure prediction has advanced significantly in recent years due to improving contact prediction accuracy. This improvement has been largely due to deep learning approaches that predict inter-residue contacts and, more recently, distances using multiple sequence alignments (MSAs). In this work we present AttentiveDist, a novel approach that uses different MSAs generated with different E-values in a single model to increase the co-evolutionary information provided to the model. To determine the importance of each MSA’s feature at the inter-residue level, we added an attention layer to the deep neural network. The model is trained in a multi-task fashion to also predict backbone and orientation angles further improving the inter-residue distance prediction. We show that AttentiveDist outperforms the top methods for contact prediction in the CASP13 structure prediction competition. To aid in structure modeling we also developed two new deep learning-based sidechain center distance and peptide-bond nitrogen-oxygen distance prediction models. Together these led to a 12% increase in TM-score from the best server method in CASP13 for structure prediction.

Download Full-text

Contrastive learning on protein embeddings enlightens midnight zone at lightning speed

10.1101/2021.11.14.468528 ◽

2021 ◽

Author(s):

Michael Heinzinger ◽

Maria Littmann ◽

Ian Sillitoe ◽

Nicola Bordin ◽

Christine Orengo ◽

...

Keyword(s):

Structure Prediction ◽

Sequence Similarity ◽

3D Structure ◽

Three Dimensional ◽

Hierarchical Classification ◽

Language Models ◽

Sequence Alignments ◽

Sequence Comparisons ◽

Multiple Sequence ◽

3D Structures

Thanks to the recent advances in protein three-dimensional (3D) structure prediction, in particular through AlphaFold 2 and RoseTTAFold, the abundance of protein 3D information will explode over the next year(s). Expert resources based on 3D structures such as SCOP and CATH have been organizing the complex sequence-structure-function relations into a hierarchical classification schema. Experimental structures are leveraged through multiple sequence alignments, or more generally through homology-based inference (HBI) transferring annotations from a protein with experimentally known annotation to a query without annotation. Here, we presented a novel approach that expands the concept of HBI from a low-dimensional sequence-distance lookup to the level of a high-dimensional embedding-based annotation transfer (EAT). Secondly, we introduced a novel solution using single protein sequence representations from protein Language Models (pLMs), so called embeddings (Prose, ESM-1b, ProtBERT, and ProtT5), as input to contrastive learning, by which a new set of embeddings was created that optimized constraints captured by hierarchical classifications of protein 3D structures. These new embeddings (dubbed ProtTucker) clearly improved what was historically referred to as threading or fold recognition. Thereby, the new embeddings enabled the intrusion into the midnight zone of protein comparisons, i.e., the region in which the level of pairwise sequence similarity is akin of random relations and therefore is hard to navigate by HBI methods. Cautious benchmarking showed that ProtTucker reached much further than advanced sequence comparisons without the need to compute alignments allowing it to be orders of magnitude faster. Code is available at https://github.com/Rostlab/EAT .

Download Full-text

Analyzing effect of quadruple multiple sequence alignments on deep learning based protein inter-residue distance prediction

Scientific Reports ◽

10.1038/s41598-021-87204-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Aashish Jain ◽

Genki Terashi ◽

Yuki Kagaya ◽

Sai Raghavendra Maddhuri Venkata Subramaniya ◽

Charles Christoffer ◽

...

Keyword(s):

Deep Learning ◽

Structure Prediction ◽

Tertiary Structure ◽

3D Structure ◽

Evolutionary Information ◽

Learning Approaches ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Novel Approach

AbstractProtein 3D structure prediction has advanced significantly in recent years due to improving contact prediction accuracy. This improvement has been largely due to deep learning approaches that predict inter-residue contacts and, more recently, distances using multiple sequence alignments (MSAs). In this work we present AttentiveDist, a novel approach that uses different MSAs generated with different E-values in a single model to increase the co-evolutionary information provided to the model. To determine the importance of each MSA’s feature at the inter-residue level, we added an attention layer to the deep neural network. We show that combining four MSAs of different E-value cutoffs improved the model prediction performance as compared to single E-value MSA features. A further improvement was observed when an attention layer was used and even more when additional prediction tasks of bond angle predictions were added. The improvement of distance predictions were successfully transferred to achieve better protein tertiary structure modeling.

Download Full-text

3D structure prediction and molecular dynamics simulation studies of GPR139

2016 International Conference on Bioinformatics and Systems Biology (BSB) ◽

10.1109/bsb.2016.7552143 ◽

2016 ◽

Author(s):

Aman Chandra Kaushik ◽

Shakti Sahi

Keyword(s):

Molecular Dynamics ◽

Molecular Dynamics Simulation ◽

Structure Prediction ◽

3D Structure ◽

Dynamics Simulation ◽

Simulation Studies ◽

3D Structure Prediction

Download Full-text

Accurate prediction of transmembrane β-barrel proteins from sequences

10.1101/006577 ◽

2014 ◽

Cited By ~ 2

Author(s):

Sikander Hayat ◽

Chris Sander ◽

Arne Elofsson ◽

Debora S. Marks

Keyword(s):

Membrane Proteins ◽

Outer Membrane Proteins ◽

3D Structure ◽

3D Models ◽

Biological Research ◽

Sequence Information ◽

Major Advance ◽

Sequence Alignments ◽

Successful Prediction ◽

Protein Biogenesis

AbstractTransmembrane β-barrels are known to play major roles in substrate transport and protein biogenesis in gram-negative bacteria, chloroplasts and mitochondria. However, the exact number of transmembrane β-barrel families is unknown and experimental structure determination is challenging. In theory, if one knows the number of strands in the β-barrel, then the 3D structure of the barrel could be trivial, but current topology predictions do not predict accurate structures and are unable to give information beyond the β-strands in the barrel. Recent work has shown successful prediction of globular and alpha-helical membrane proteins from sequence alignments, by using high ranked evolutionary couplings between residues as distance constraints to fold extended polypeptides. However, these methods, have not addressed the calculation of precise β-sheet hydrogen bonding that defines transmembrane β-barrels, and would be required to fold these proteins successfully. Hence we developed a method (EVFold_BB) that can successfully model transmembrane β-barrels by combining evolutionary couplings together with topology predictions. EVFold_BB is validated by the accurate all-atom 3D modeling of 18 proteins, representing all known membrane β-barrel families that have sufficient sequences available. To demonstrate the potential of our approach we predict the unknown 3D structure of the LptD protein, the plausibility of its accuracy is supported by the blindly predicted benchmarks, and is consistent with experimental observations. Our approach can naturally be extended to all unknown β-barrel proteins with sufficient sequence information.SignificanceEVFold_BB predicts fast, accurate 3D models of large membrane β-barrels that are notoriously hard to solve experimentally. The major advance is the use of evolutionary couplings from sequence alignments together with the β-strand prediction to ascertain accurate hydrogen bond between theβ-strands that gives rise to the canonical barrel shapes. The method will enable biological research into outer-membrane proteins.

Download Full-text

3PT107 RNA 3D structure prediction by using coarse-grained molecular dynamics simulation(The 50th Annual Meeting of the Biophysical Society of Japan)

Seibutsu Butsuri ◽

10.2142/biophys.52.s158_4 ◽

2012 ◽

Vol 52 (supplement) ◽

pp. S158

Author(s):

Tomoshi Kameda

Keyword(s):

Molecular Dynamics ◽

Molecular Dynamics Simulation ◽

Annual Meeting ◽

Structure Prediction ◽

3D Structure ◽

Dynamics Simulation ◽

Coarse Grained ◽

Biophysical Society ◽

3D Structure Prediction ◽

Coarse Grained Molecular Dynamics

Download Full-text