Deep Neural Network for Protein Contact Prediction by Weighting Sequences in a Multiple Sequence Alignment

Mapping Intimacies ◽

10.1101/331926 ◽

2018 ◽

Author(s):

Hiroyuki Fukuda ◽

Kentaro Tomii

Keyword(s):

Neural Network ◽

Supervised Learning ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Structure Prediction ◽

Deep Neural Network ◽

Multiple Sequence ◽

Contact Prediction ◽

Meta Learning ◽

Correlation Information

AbstractProtein contact prediction is a crucially important step for protein structure prediction. To predict a contact, approaches of two types are used: evolutionary coupling analysis (ECA) and supervised learning. ECA uses a large multiple sequence alignment (MSA) of homologue sequences and extract correlation information between residues. Supervised learning uses ECA analysis results as input features and can produce higher accuracy. As described herein, we present a new approach to contact prediction which can both extract correlation information and predict contacts in a supervised manner directly from MSA using a deep neural network (DNN). Using DNN, we can obtain higher accuracy than with earlier ECA methods. Simultaneously, we can weight each sequence in MSA to eliminate noise sequences automatically in a supervised way. It is expected that the combination of our method and other meta-learning methods can provide much higher accuracy of contact prediction.

Download Full-text

A data-centric pipeline using convolutional neural network to select better multiple sequence alignment method

Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics ◽

10.1145/3388440.3414909 ◽

2020 ◽

Author(s):

Mengmeng Kuang ◽

Hing-fung Ting

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Alignment Method ◽

Multiple Sequence

Download Full-text

Protein Complex Structure Prediction Powered by Multiple Sequence Alignment of Interologs from Multiple Taxonomic Ranks and AlphaFold2

10.1101/2021.12.21.473437 ◽

2021 ◽

Author(s):

Yunda Si ◽

Chengfei Yan

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Protein Complex ◽

Structure Prediction ◽

Complex Structure ◽

Complex Structures ◽

Success Rates ◽

Multiple Sequence ◽

Taxonomic Rank ◽

Protein Protein Interaction

AlphaFold2 is expected to be able to predict protein complex structures as long as a multiple sequence alignment (MSA) of the interologs of the target protein-protein interaction (PPI) can be provided. However, preparing the MSA of protein-protein interologs is a non-trivial task. In this study, a simplified phylogeny-based approach was applied to generate the MSA of interologs, which was then used as the input of AlphaFold2 for protein complex structure prediction. Extensively benchmarked this protocol on non-redundant PPI dataset, we show complex structures of 79.5% of the bacterial PPIs and 49.8% of the eukaryotic PPIs can be successfully predicted. Considering PPIs may not be conserved in species with long evolutionary distances, we further restricted interologs in the MSA to different taxonomic ranks of the species of the target PPI in protein complex structure prediction. We found the success rates can be increased to 87.9% for the bacterial PPIs and 56.3% of the eukaryotic PPIs if interologs in the MSA are restricted to a specific taxonomic rank of the species of each target PPI. Finally, we show the optimal taxonomic ranks for protein complex structure prediction can be selected with the application of the predicted TM-scores of the output models.

Download Full-text

CopulaNet: Learning residue co-evolution directly from multiple sequence alignment for protein structure prediction

10.1101/2020.10.06.327585 ◽

2020 ◽

Author(s):

Fusong Ju ◽

Jianwei Zhu ◽

Bin Shao ◽

Lupeng Kong ◽

Tie-Yan Liu ◽

...

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Structure Prediction ◽

Tertiary Structure ◽

Query Protein ◽

Spatial Proximity ◽

Multiple Sequence ◽

Variance Matrix

Protein functions are largely determined by the final details of their tertiary structures, and the structures could be accurately reconstructed based on inter-residue distances. Residue co-evolution has become the primary principle for estimating inter-residue distances since the residues in close spatial proximity tend to co-evolve. The widely-used approaches infer residue co-evolution using an indirect strategy, i.e., they first extract from the multiple sequence alignment (MSA) of query protein some handcrafted features, say, co-variance matrix, and then infer residue co-evolution using these features rather than the raw information carried by MSA. This indirect strategy always leads to considerable information loss and inaccurate estimation of inter-residue distances. Here, we report a deep neural network framework (called CopulaNet) to learn residue co-evolution directly from MSA without any handcrafted features. The CopulaNet consists of two key elements: i) an encoder to model context-specific mutation for each residue, and ii) an aggregator to model correlations among residues and thereafter infer residue co-evolutions. Using the CASP13 (the 13th Critical Assessment of Protein Structure Prediction) target proteins as representatives, we demonstrated the successful application of CopulaNet for estimating inter-residue distances and further predicting protein tertiary structure with improved accuracy and efficiency. Head-to-head comparison suggested that for 24 out of the 31 free modeling CASP13 domains, ProFOLD outperformed AlphaFold, one of the state-of-the-art prediction approaches.

Download Full-text

Integrating Protein Secondary Structure Prediction and Multiple Sequence Alignment

Current Protein and Peptide Science ◽

10.2174/1389203043379675 ◽

2004 ◽

Vol 5 (4) ◽

pp. 249-266 ◽

Cited By ~ 35

Author(s):

V. Simossis ◽

J. Heringa

Keyword(s):

Secondary Structure ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Protein Secondary Structure ◽

Protein Secondary Structure Prediction ◽

Multiple Sequence

Download Full-text

A Novel Comparative Sequence Analysis Method for ncRNA Secondary Structure Prediction without Multiple Sequence Alignment

2008 Fourth International Conference on Natural Computation ◽

10.1109/icnc.2008.446 ◽

2008 ◽

Author(s):

Quan Zou ◽

Mao-Zu Guo ◽

Yang Liu ◽

Zhi-An Xing

Keyword(s):

Sequence Analysis ◽

Secondary Structure ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Comparative Sequence Analysis ◽

Analysis Method ◽

Multiple Sequence ◽

Comparative Sequence

Download Full-text

Application of multiple sequence alignment profiles to improve protein secondary structure prediction

Proteins Structure Function and Bioinformatics ◽

10.1002/1097-0134(20000815)40:3<502::aid-prot170>3.0.co;2-q ◽

2000 ◽

Vol 40 (3) ◽

pp. 502-511 ◽

Cited By ~ 484

Author(s):

James A. Cuff ◽

Geoffrey J. Barton

Keyword(s):

Secondary Structure ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Protein Secondary Structure ◽

Protein Secondary Structure Prediction ◽

Multiple Sequence

Download Full-text

Liquid-theory analogy of direct-coupling analysis of multiple-sequence alignment and its implications for protein structure prediction

Biophysics and Physicobiology ◽

10.2142/biophysico.12.0_117 ◽

2015 ◽

Vol 12 (0) ◽

pp. 117-119 ◽

Cited By ~ 1

Author(s):

Akira R. Kinjo

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Structure Prediction ◽

Direct Coupling ◽

Coupling Analysis ◽

Multiple Sequence ◽

Liquid Theory ◽

Direct Coupling Analysis

Download Full-text

A neural-network based method for prediction of γ-turns in proteins from multiple sequence alignment

Protein Science ◽

10.1110/ps.0241703 ◽

2003 ◽

Vol 12 (5) ◽

pp. 923-929 ◽

Cited By ~ 49

Author(s):

Harpreet Kaur ◽

G.P.S. Raghava

Keyword(s):

Neural Network ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Sequence

Download Full-text

fastMSA: Accelerating Multiple Sequence Alignment with Dense Retrieval on Protein Language

10.1101/2021.12.20.473431 ◽

2021 ◽

Author(s):

Liang Hong ◽

Siqi Sun ◽

Liangzhen Zheng ◽

Qingxiong Tan ◽

Yu Li

Keyword(s):

Protein Structure ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Structure Prediction ◽

Structure And Function ◽

Sequence Alignments ◽

Protein Structure And Function ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

And Function

Evolutionarily related sequences provide information for the protein structure and function. Multiple sequence alignment, which includes homolog searching from large databases and sequence alignment, is efficient to dig out the information and assist protein structure and function prediction, whose efficiency has been proved by AlphaFold. Despite the existing tools for multiple sequence alignment, searching homologs from the entire UniProt is still time-consuming. Considering the success of AlphaFold, foreseeably, large- scale multiple sequence alignments against massive databases will be a trend in the field. It is very desirable to accelerate this step. Here, we propose a novel method, fastMSA, to improve the speed significantly. Our idea is orthogonal to all the previous accelerating methods. Taking advantage of the protein language model based on BERT, we propose a novel dual encoder architecture that can embed the protein sequences into a low-dimension space and filter the unrelated sequences efficiently before running BLAST. Extensive experimental results suggest that we can recall most of the homologs with a 34-fold speed-up. Moreover, our method is compatible with the downstream tasks, such as structure prediction using AlphaFold. Using multiple sequence alignments generated from our method, we have little performance compromise on the protein structure prediction with much less running time. fastMSA will effectively assist protein sequence, structure, and function analysis based on homologs and multiple sequence alignment.

Download Full-text

DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins

Bioinformatics ◽

10.1093/bioinformatics/btz863 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2105-2112 ◽

Cited By ~ 14

Author(s):

Chengxin Zhang ◽

Wei Zheng ◽

S M Mortuza ◽

Yang Li ◽

Yang Zhang

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Large Scale ◽

Secondary Structure Prediction ◽

Supplementary Information ◽

Structure Identification ◽

Whole Genome ◽

Multiple Sequence ◽

Contact Prediction ◽

Homologous Sequences

Abstract Motivation The success of genome sequencing techniques has resulted in rapid explosion of protein sequences. Collections of multiple homologous sequences can provide critical information to the modeling of structure and function of unknown proteins. There are however no standard and efficient pipeline available for sensitive multiple sequence alignment (MSA) collection. This is particularly challenging when large whole-genome and metagenome databases are involved. Results We developed DeepMSA, a new open-source method for sensitive MSA construction, which has homologous sequences and alignments created from multi-sources of whole-genome and metagenome databases through complementary hidden Markov model algorithms. The practical usefulness of the pipeline was examined in three large-scale benchmark experiments based on 614 non-redundant proteins. First, DeepMSA was utilized to generate MSAs for residue-level contact prediction by six coevolution and deep learning-based programs, which resulted in an accuracy increase in long-range contacts by up to 24.4% compared to the default programs. Next, multiple threading programs are performed for homologous structure identification, where the average TM-score of the template alignments has over 7.5% increases with the use of the new DeepMSA profiles. Finally, DeepMSA was used for secondary structure prediction and resulted in statistically significant improvements in the Q3 accuracy. It is noted that all these improvements were achieved without re-training the parameters and neural-network models, demonstrating the robustness and general usefulness of the DeepMSA in protein structural bioinformatics applications, especially for targets without homologous templates in the PDB library. Availability and implementation https://zhanglab.ccmb.med.umich.edu/DeepMSA/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text