ADLD: A Novel Graphical Representation of Protein Sequences and Its Application

Computational and Mathematical Methods in Medicine ◽

10.1155/2014/959753 ◽

2014 ◽

Vol 2014 ◽

pp. 1-15 ◽

Cited By ~ 5

Author(s):

Lei Wang ◽

Hui Peng ◽

Jinhua Zheng

Keyword(s):

Graphical Representation ◽

Protein Sequences ◽

Diagonal Line ◽

Line Diagram ◽

Spike Proteins

To facilitate the intuitional analysis of protein sequences, a novel graphical representation of protein sequences called ADLD (Alignment Diagonal Line Diagram) is introduced in this paper first, and then a new ADLD based method is proposed and utilized to analyze the similarity/dissimilarity of protein sequences. Comparing with existing methods, our ADLD based method is proved to be effective in the similarity/dissimilarity analysis of protein sequences and have the merits of good intuition, visuality, and simplicity. The examinations of the similarities/dissimilarities for both the 16 different ND5 proteins and the 29 different spike proteins illustrate the utility of our ADLD based approach.

Download Full-text

A Graphical Representation of Protein Sequences and Its Applications

Proceedings of the Fourth International Conference on Biological Information and Biomedical Engineering ◽

10.1145/3403782.3403812 ◽

2020 ◽

Author(s):

Ping-An He ◽

Linlin Yan ◽

Tianyu Zhu

Keyword(s):

Graphical Representation ◽

Protein Sequences

Download Full-text

2-D graphical representation of protein sequences and its application to coronavirus phylogeny

BMB Reports ◽

10.5483/bmbrep.2008.41.3.217 ◽

2008 ◽

Vol 41 (3) ◽

pp. 217-222 ◽

Cited By ~ 22

Author(s):

Chun Li ◽

Lili Xing ◽

Xin Wang

Keyword(s):

Graphical Representation ◽

Protein Sequences

Download Full-text

Comparative Studies Based on a 3-D Graphical Representation of Protein Sequences

Intelligent Computing Theories and Methodologies - Lecture Notes in Computer Science ◽

10.1007/978-3-319-22186-1_43 ◽

2015 ◽

pp. 436-444

Author(s):

Yingzhao Liu ◽

Yan-chun Yang ◽

Tian-ming Wang

Keyword(s):

Comparative Studies ◽

Graphical Representation ◽

Protein Sequences

Download Full-text

Similarity/Dissimilarity Analysis of Protein Sequences Based on a New Spectrum-Like Graphical Representation

Evolutionary Bioinformatics ◽

10.4137/ebo.s14713 ◽

2014 ◽

Vol 10 ◽

pp. EBO.S14713 ◽

Cited By ~ 10

Author(s):

Yuhua Yao ◽

Shoujiang Yan ◽

Huimin Xu ◽

Jianning Han ◽

Xuying Nan ◽

...

Keyword(s):

Graphical Representation ◽

Protein Sequences

Download Full-text

Measuring Similarity among Protein Sequences Using a New Descriptor

BioMed Research International ◽

10.1155/2019/2796971 ◽

2019 ◽

Vol 2019 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Mervat M. Abo-Elkhier ◽

Marwa A. Abd Elwahaab ◽

Moheb I. Abo El Maaty

Keyword(s):

Protein Sequence ◽

Nadh Dehydrogenase ◽

Graphical Representation ◽

Protein Sequences ◽

Computation Time ◽

Fundamental Aspect ◽

Beta Globin ◽

Nadh Dehydrogenase Subunit ◽

The Public ◽

Sequencing Technologies

The comparison of protein sequences according to similarity is a fundamental aspect of today’s biomedical research. With the developments of sequencing technologies, a large number of protein sequences increase exponentially in the public databases. Famous sequences’ comparison methods are alignment based. They generally give excellent results when the sequences under study are closely related and they are time consuming. Herein, a new alignment-free method is introduced. Our technique depends on a new graphical representation and descriptor. The graphical representation of protein sequence is a simple way to visualize protein sequences. The descriptor compresses the primary sequence into a single vector composed of only two values. Our approach gives good results with both short and long sequences within a little computation time. It is applied on nine beta globin, nine ND5 (NADH dehydrogenase subunit 5), and 24 spike protein sequences. Correlation and significance analyses are also introduced to compare our similarity/dissimilarity results with others’ approaches, results, and sequence homology.

Download Full-text

Protein Sequence Comparison and DNA-binding Protein Identification with Generalized PseAAC and Graphical Representation

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207321666180130100838 ◽

2018 ◽

Vol 21 (2) ◽

pp. 100-110 ◽

Cited By ~ 3

Author(s):

Chun Li ◽

Jialing Zhao ◽

Changzhong Wang ◽

Yuhua Yao

Keyword(s):

Dna Binding ◽

Protein Sequence ◽

Protein Identification ◽

Binding Proteins ◽

Graphical Representation ◽

Sequence Data ◽

Protein Sequences ◽

Dna Binding Proteins ◽

Support Vector ◽

Letter Sequence

Aim and Objective: The rapid increase in the amount of protein sequence data available leads to an urgent need for novel computational algorithms to analyze and compare these sequences. This study is undertaken to develop an efficient computational approach for timely encoding protein sequences and extracting the hidden information. Methods: Based on two physicochemical properties of amino acids, a protein primary sequence was converted into a three-letter sequence, and then a graph without loops and multiple edges and its geometric line adjacency matrix were obtained. A generalized PseAAC (pseudo amino acid composition) model was thus constructed to characterize a protein sequence numerically. Results: By using the proposed mathematical descriptor of a protein sequence, similarity comparisons among β-globin proteins of 17 species and 72 spike proteins of coronaviruses were made, respectively. The resulting clusters agreed well with the established taxonomic groups. In addition, a generalized PseAAC based SVM (support vector machine) model was developed to identify DNA-binding proteins. Experiment results showed that our method performed better than DNAbinder, DNA-Prot, iDNA-Prot and enDNA-Prot by 3.29-10.44% in terms of ACC, 0.056-0.206 in terms of MCC, and 1.45-15.76% in terms of F1M. When the benchmark dataset was expanded with negative samples, the presented approach outperformed the four previous methods with improvement in the range of 2.49-19.12% in terms of ACC, 0.05-0.32 in terms of MCC, and 3.82- 33.85% in terms of F1M. Conclusion: These results suggested that the generalized PseAAC model was very efficient for comparison and analysis of protein sequences, and very competitive in identifying DNA-binding proteins.

Download Full-text

The graphical representation of protein sequences based on the physicochemical properties and its applications

Journal of Computational Chemistry ◽

10.1002/jcc.21501 ◽

2010 ◽

Vol 31 (11) ◽

pp. 2136-2142 ◽

Cited By ~ 34

Author(s):

Ping-An He ◽

Yan-Ping Zhang ◽

Yu-Hua Yao ◽

Yi-Fa Tang ◽

Xu-Ying Nan

Keyword(s):

Physicochemical Properties ◽

Graphical Representation ◽

Protein Sequences

Download Full-text

Leveraging Deep Learning to Simulate Coronavirus Spike proteins has the potential to predict future Zoonotic sequences

10.1101/2020.04.20.046920 ◽

2020 ◽

Cited By ~ 1

Author(s):

Lisa C Crossman

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Pfam Domain ◽

Protein Sequences ◽

Host Cells ◽

Spike Protein ◽

Upper Respiratory Tract ◽

The Neural Network ◽

Spike Sequences ◽

Spike Proteins

AbstractMotivationCoronaviridae are a family of positive-sense RNA viruses capable of infecting humans and animals. These viruses usually cause a mild to moderate upper respiratory tract infection, however, they can also cause more severe symptoms, gastrointestinal and central nervous system diseases. These viruses are capable of flexibly adapting to new environments, hence health threats from coronavirus are constant and long-term. Immunogenic spike proteins are glyco-proteins found on the surface of Coronaviridae particles that mediate entry to host cells. The aim of this study was to train deep learning neural networks to produce simulated spike protein sequences, which may be able to aid in knowledge and/or vaccine design by creating alternative possible spike sequences that could arise from zoonotic sources in future.ResultsHere we have trained deep learning recurrent neural networks (RNN) to provide computer-simulated coronavirus spike protein sequences in the style of previously known sequences and examine their characteristics. Training used a dataset of alpha, beta, gamma and delta coronavirus spike sequences. In a test set of 100 simulated sequences, all 100 had most significant BLAST matches to Spike proteins in searches against NCBI non-redundant dataset (NR) and also possessed concomitant Pfam domain matches.ConclusionsSimulated sequences from the neural network may be able to guide us in future with prospective targets for vaccine discovery in advance of a potential novel zoonosis. We may effectively be able to fast-forward through evolution using neural networks to investigate sequences that could arise.

Download Full-text

3D graphical representation of protein sequences and their statistical characterization

Physica A Statistical Mechanics and its Applications ◽

10.1016/j.physa.2010.06.031 ◽

2010 ◽

Vol 389 (21) ◽

pp. 4668-4676 ◽

Cited By ~ 36

Author(s):

Moheb I. Abo el Maaty ◽

Mervat M. Abo-Elkhier ◽

Marwa A. Abd Elwahaab

Keyword(s):

Graphical Representation ◽

Protein Sequences ◽

Statistical Characterization ◽

3D Graphical Representation

Download Full-text

A new model of amino acids evolution, evolution index of amino acids and its application in graphical representation of protein sequences

Chemical Physics Letters ◽

10.1016/j.cplett.2010.08.010 ◽

2010 ◽

Vol 497 (4-6) ◽

pp. 223-228 ◽

Cited By ~ 7

Author(s):

Yi Zhang

Keyword(s):

Amino Acids ◽

Graphical Representation ◽

Protein Sequences ◽

New Model

Download Full-text