Protein Multiple Alignments: Sequence-based vs Structure-based Programs

Mapping Intimacies ◽

10.1101/413369 ◽

2018 ◽

Author(s):

Mathilde Carpentier ◽

Jacques Chomilier

Keyword(s):

Multiple Sequence Alignment ◽

Added Value ◽

Sequence Structure ◽

Multiple Sequence ◽

Sequence Identity ◽

Multiple Alignments ◽

Reference Databases ◽

Low Levels ◽

Existing Data ◽

Proteins Classification

ABSTRACTFacing the huge increase of information about proteins, classification has reached the level of a compulsory task, essential for assigning a function to a given sequence, by means of comparison to existing data. Multiple sequence alignment programs have been proven to be very useful and they have already been evaluated. In this paper we wished to evaluate the added value provided by taking into account structures. We compared the multiple alignments resulting from 24 programs, either based on sequence, structure, or both, to reference alignments deposited in five databases. Reference databases, on their side, can be split in two: more automatic ones, and more manually ones. Scores have been attributed to each program. As a global rule of thumb, five groups of methods emerge, with the lead to two of the structure-based programs. This advantage is increased at low levels of sequence identity among aligned proteins, or for residues in regular secondary structures or buried. Concerning gap management, sequence-based programs place less gaps than structure-based programs. Concerning the databases, the alignments from the manually built databases are the more challenging for the programs.

Download Full-text

Protein multiple alignments: sequence-based versus structure-based programs

Bioinformatics ◽

10.1093/bioinformatics/btz236 ◽

2019 ◽

Vol 35 (20) ◽

pp. 3970-3980 ◽

Cited By ~ 6

Author(s):

Mathilde Carpentier ◽

Jacques Chomilier

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Added Value ◽

Supplementary Information ◽

Supplementary Data ◽

Sequence Structure ◽

Multiple Sequence ◽

Sequence Identity ◽

Multiple Alignments ◽

Low Levels

Abstract Motivation Multiple sequence alignment programs have proved to be very useful and have already been evaluated in the literature yet not alignment programs based on structure or both sequence and structure. In the present article we wish to evaluate the added value provided through considering structures. Results We compared the multiple alignments resulting from 25 programs either based on sequence, structure or both, to reference alignments deposited in five databases (BALIBASE 2 and 3, HOMSTRAD, OXBENCH and SISYPHUS). On the whole, the structure-based methods compute more reliable alignments than the sequence-based ones, and even than the sequence+structure-based programs whatever the databases. Two programs lead, MAMMOTH and MATRAS, nevertheless the performances of MUSTANG, MATT, 3DCOMB, TCOFFEE+TM_ALIGN and TCOFFEE+SAP are better for some alignments. The advantage of structure-based methods increases at low levels of sequence identity, or for residues in regular secondary structures or buried ones. Concerning gap management, sequence-based programs set less gaps than structure-based programs. Concerning the databases, the alignments of the manually built databases are more challenging for the programs. Availability and implementation All data and results presented in this study are available at: http://wwwabi.snv.jussieu.fr/people/mathilde/download/AliMulComp/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MULTIPLE SEQUENCE ALIGNMENT USING AN EXHAUSTIVE AND GREEDY ALGORITHM

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972000500103x ◽

2005 ◽

Vol 03 (02) ◽

pp. 243-255 ◽

Cited By ~ 1

Author(s):

YI WANG ◽

KUO-BIN LI

Keyword(s):

Greedy Algorithm ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Alignment ◽

Initial Alignment ◽

Progressive Alignment ◽

Multiple Sequence ◽

Java Programming ◽

Multiple Alignments ◽

Objective Score

We describe an exhaustive and greedy algorithm for improving the accuracy of multiple sequence alignment. A simple progressive alignment approach is employed to provide initial alignments. The initial alignment is then iteratively optimized against an objective function. For any working alignment, the optimization involves three operations: insertions, deletions and shuffles of gaps. The optimization is exhaustive since the algorithm applies the above operations to all eligible positions of an alignment. It is also greedy since only the operation that gives the best improving objective score will be accepted. The algorithms have been implemented in the EGMA (Exhaustive and Greedy Multiple Alignment) package using Java programming language, and have been evaluated using the BAliBASE benchmark alignment database. Although EGMA is not guaranteed to produce globally optimized alignment, the tests indicate that EGMA is able to build alignments with high quality consistently, compared with other commonly used iterative and non-iterative alignment programs. It is also useful for refining multiple alignments obtained by other methods.

Download Full-text

A Simple Genetic Algorithm for Optimizing Multiple Sequence Alignment on the Spread of the SARS Epidemic

The Open Bioinformatics Journal ◽

10.2174/1875036201912010030 ◽

2019 ◽

Vol 12 (1) ◽

pp. 30-39

Author(s):

Siti Amiroch ◽

M. Syaiful Pradana ◽

M. Isa Irawan ◽

Imam Mukhlash

Keyword(s):

Genetic Algorithms ◽

Phylogenetic Tree ◽

Sequence Alignment ◽

Dna Sequences ◽

Multiple Sequence Alignment ◽

Network System ◽

Multiple Sequence ◽

Network Analyses ◽

Multiple Alignments ◽

Mutation Region

Background:Multiple sequence alignment is a method of getting genomic relationships between 3 sequences or more. In multiple alignments, there are 3 mutation network analyses, namely topological network system, mutation region network and network system of mutation mode. In general, the three analyses show stable and unstable regions that map mutation regions. This area of mutation is described further in a phylogenetic tree which simultaneously illustrates the path of the spread of an epidemic, the Severe Acute Respiratory Syndrome (SARS) epidemic. The process of spreading the SARS viruses, in this case, is described as the process of phylogenetic tree formation, and as a novelty of this research, multiple alignments in the process are analyzed in detail and then optimized with genetic algorithms.Methods:The data used to form the phylogenetic tree for the spread of the SARS epidemic are 14 DNA sequences which are then optimized by using genetic algorithms. The phylogenetic tree is constructed by using the neighbor-joining algorithm with a distance matrix that the intended distance is the genetic distance obtained from sequence alignment by using the Needleman Wunsch Algorithm.Results & Conclusion:The results of the analysis obtained 3649 stable areas and 19 unstable areas. The results of phylogenetic tree from the network system analysis indicated that the spread of the SARS epidemic extended from Guangzhou 16/12/02 to Zhongshan 27/12/02, then spread simultaneously to Guangzhou 18/02/03 and Guangzhou hospital. After that, the virus reached Metropole, Zhongshan, Hongkong, Singapore, Taiwan, Hong kong, and Hanoi which then continued to Guangzhou 01/01/03 and Toronto at once. The results of the mutation region network system demonstrate decomposition of orthogonal mutations in the 1st order arc.

Download Full-text

Multiple Sequence Alignment and Profile Analysis of Protein Family Utsing Hidden Markov Model

International Journal of Scientific Research ◽

10.15373/22778179/june2013/66 ◽

2012 ◽

Vol 2 (6) ◽

pp. 208-211

Author(s):

Navjot Kaur ◽

◽

Rajbir Singh Cheema ◽

Harmandeep Singh Harmandeep Singh

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Profile Analysis ◽

Hidden Markov ◽

Protein Family ◽

Multiple Sequence

Download Full-text

Faculty Opinions recommendation of MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.731078852.793536612 ◽

2017 ◽

Author(s):

Feng Gao

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Online Service ◽

Multiple Sequence

Download Full-text

Computational Analysis of Therapeutic Enzyme Uricase from Different Source Organisms

Current Proteomics ◽

10.2174/1570164616666190617165107 ◽

2020 ◽

Vol 17 (1) ◽

pp. 59-77

Author(s):

Anand Kumar Nelapati ◽

JagadeeshBabu PonnanEttiyappan

Keyword(s):

Uric Acid ◽

Amino Acid ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Protein Sequences ◽

Amino Acid Sequences ◽

Amino Acid Residues ◽

Multiple Sequence ◽

Physiochemical Properties ◽

Pharmaceutical Industries

Background:Hyperuricemia and gout are the conditions, which is a response of accumulation of uric acid in the blood and urine. Uric acid is the product of purine metabolic pathway in humans. Uricase is a therapeutic enzyme that can enzymatically reduces the concentration of uric acid in serum and urine into more a soluble allantoin. Uricases are widely available in several sources like bacteria, fungi, yeast, plants and animals.Objective:The present study is aimed at elucidating the structure and physiochemical properties of uricase by insilico analysis.Methods:A total number of sixty amino acid sequences of uricase belongs to different sources were obtained from NCBI and different analysis like Multiple Sequence Alignment (MSA), homology search, phylogenetic relation, motif search, domain architecture and physiochemical properties including pI, EC, Ai, Ii, and were performed.Results:Multiple sequence alignment of all the selected protein sequences has exhibited distinct difference between bacterial, fungal, plant and animal sources based on the position-specific existence of conserved amino acid residues. The maximum homology of all the selected protein sequences is between 51-388. In singular category, homology is between 16-337 for bacterial uricase, 14-339 for fungal uricase, 12-317 for plants uricase, and 37-361 for animals uricase. The phylogenetic tree constructed based on the amino acid sequences disclosed clusters indicating that uricase is from different source. The physiochemical features revealed that the uricase amino acid residues are in between 300- 338 with a molecular weight as 33-39kDa and theoretical pI ranging from 4.95-8.88. The amino acid composition results showed that valine amino acid has a high average frequency of 8.79 percentage compared to different amino acids in all analyzed species.Conclusion:In the area of bioinformatics field, this work might be informative and a stepping-stone to other researchers to get an idea about the physicochemical features, evolutionary history and structural motifs of uricase that can be widely used in biotechnological and pharmaceutical industries. Therefore, the proposed in silico analysis can be considered for protein engineering work, as well as for gout therapy.

Download Full-text

LegumeDB: Development of Legume Medicinal Plant Database and Comparative Molecular Evolutionary Analysis of matK Proteins of Legumes and Mangroves

Current Nutrition & Food Science ◽

10.2174/1573401314666180223143523 ◽

2019 ◽

Vol 15 (4) ◽

pp. 353-362

Author(s):

Sambhaji B. Thakar ◽

Maruti J. Dhanavade ◽

Kailas D. Sonawane

Keyword(s):

Phylogenetic Analysis ◽

Medicinal Plants ◽

Homology Modeling ◽

Sequence Alignment ◽

Vigna Unguiculata ◽

Multiple Sequence Alignment ◽

Legume Species ◽

Mangrove Species ◽

Multiple Sequence ◽

Thespesia Populnea

Background: Legume plants are known for their rich medicinal and nutritional values. Large amount of medicinal information of various legume plants have been dispersed in the form of text. Objective: It is essential to design and construct a legume medicinal plants database, which integrate respective classes of legumes and include knowledge regarding medicinal applications along with their protein/enzyme sequences. Methods: The design and development of Legume Medicinal Plants Database (LegumeDB) has been done by using Microsoft Structure Query Language Server 2017. DBMS was used as back end and ASP.Net was used to lay out front end operations. VB.Net was used as arranged program for coding. Multiple sequence alignment, phylogenetic analysis and homology modeling techniques were also used. Results: This database includes information of 50 Legume medicinal species, which might be helpful to explore the information for researchers. Further, maturase K (matK) protein sequences of legumes and mangroves were retrieved from NCBI for multiple sequence alignment and phylogenetic analysis to understand evolutionary lineage between legumes and mangroves. Homology modeling technique was used to determine three-dimensional structure of matK from Legume species i.e. Vigna unguiculata using matK of mangrove species, Thespesia populnea as a template. The matK sequence analysis results indicate the conserved residues among legume and mangrove species. Conclusion: Phylogenetic analysis revealed closeness between legume species Vigna unguiculata and mangrove species Thespesia populnea to each other, indicating their similarity and origin from common ancestor. Thus, these studies might be helpful to understand evolutionary relationship between legumes and mangroves. : LegumeDB availability: http://legumedatabase.co.in

Download Full-text

A Hierarchical Classification for the Selection of the Most Suitable Multiple Sequence Alignment Methodology

Current Bioinformatics ◽

10.2174/157489361002150518145112 ◽

2015 ◽

Vol 10 (2) ◽

pp. 199-207

Author(s):

Francisco Ortuño ◽

Hector Pomares ◽

Olga Valenzuela ◽

Carolina Torres ◽

Ignacio Rojas

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Hierarchical Classification ◽

Multiple Sequence ◽

Selection Of

Download Full-text

Reference Alignment Based Methods for Quality Evaluation of Multiple Sequence Alignment - A Survey

Current Bioinformatics ◽

10.2174/15748936113089990005 ◽

2013 ◽

Vol 8 (999) ◽

pp. 1-6

Author(s):

Pawel Wojciechowski ◽

Piotr Formanowicz ◽

Jacek Blazewicz

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Quality Evaluation ◽

Reference Alignment ◽

Multiple Sequence

Download Full-text

Amino Acids Sequence-based Analysis of Arginine Deiminase from Different Prokaryotic Organisms: An In Silico Approach

Recent Patents on Biotechnology ◽

10.2174/1872208314666200324114441 ◽

2020 ◽

Vol 14 (3) ◽

pp. 235-246

Author(s):

Sara Abdollahi ◽

Mohammad H. Morowvat ◽

Amir Savardashtaki ◽

Cambyz Irajie ◽

Sohrab Najafipour ◽

...

Keyword(s):

Amino Acid ◽

Physicochemical Properties ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

In Silico ◽

Bacterial Species ◽

Arginine Deiminase ◽

Antitumor Effects ◽

Multiple Sequence ◽

In Silico Studies

Background: Arginine deiminase is a bacterial enzyme, which degrades L-arginine. Some human cancers such as hepatocellular carcinoma (HCC) and melanoma are auxotrophic for arginine. Therefore, PEGylated arginine deiminase (ADI-PEG20) is a good anticancer candidate with antitumor effects. It causes local depletion of L-arginine and growth inhibition in arginineauxotrophic tumor cells. The FDA and EMA have granted orphan status to this drug. Some recently published patents have dealt with this enzyme or its PEGylated form. Objective: Due to increasing attention to it, we aimed to evaluate and compare 30 arginine deiminase proteins from different bacterial species through in silico analysis. Methods: The exploited analyses included the investigation of physicochemical properties, multiple sequence alignment (MSA), motif, superfamily, phylogenetic and 3D comparative analyses of arginine deiminase proteins thorough various bioinformatics tools. Results: The most abundant amino acid in the arginine deiminase proteins is leucine (10.13%) while the least amino acid ratio is cysteine (0.98%). Multiple sequence alignment showed 47 conserved patterns between 30 arginine deiminase amino acid sequences. The results of sequence homology among 30 different groups of arginine deiminase enzymes revealed that all the studied sequences located in amidinotransferase superfamily. Based on the phylogenetic analysis, two major clusters were identified. Considering the results of various in silico studies; we selected the five best candidates for further investigations. The 3D structures of the best five arginine deiminase proteins were generated by the I-TASSER server and PyMOL. The RAMPAGE analysis revealed that 81.4%-91.4%, of the selected sequences, were located in the favored region of arginine deiminase proteins. Conclusion: The results of this study shed light on the basic physicochemical properties of thirty major arginine deiminase sequences. The obtained data could be employed for further in vivo and clinical studies and also for developing the related therapeutic enzymes.

Download Full-text