Trends in Prokaryotic Evolution Revealed by Comparison of Closely Related Bacterial and Archaeal Genomes

Pavel S. Novichkov; Yuri I. Wolf; Inna Dubchak; Eugene V. Koonin

doi:10.1128/jb.01237-08

Trends in Prokaryotic Evolution Revealed by Comparison of Closely Related Bacterial and Archaeal Genomes

Journal of Bacteriology ◽

10.1128/jb.01237-08 ◽

2008 ◽

Vol 191 (1) ◽

pp. 65-73 ◽

Cited By ~ 97

Author(s):

Pavel S. Novichkov ◽

Yuri I. Wolf ◽

Inna Dubchak ◽

Eugene V. Koonin

Keyword(s):

Amino Acid ◽

Genome Rearrangement ◽

Selective Pressure ◽

Protein Sequences ◽

Purifying Selection ◽

Amino Acid Sequences ◽

Strong Positive Correlation ◽

Effective Population ◽

Data Set ◽

Positive Correlation

ABSTRACT In order to explore microevolutionary trends in bacteria and archaea, we constructed a data set of 41 alignable tight genome clusters (ATGCs). We show that the ratio of the medians of nonsynonymous to synonymous substitution rates (dN/dS) that is used as a measure of the purifying selection pressure on protein sequences is a stable characteristic of the ATGCs. In agreement with previous findings, parasitic bacteria, notwithstanding the sometimes dramatic genome shrinkage caused by gene loss, are typically subjected to relatively weak purifying selection, presumably owing to relatively small effective population sizes and frequent bottlenecks. However, no evidence of genome streamlining caused by strong selective pressure was found in any of the ATGCs. On the contrary, a significant positive correlation between the genome size, as well as gene size, and selective pressure was observed, although a variety of free-living prokaryotes with very close selective pressures span nearly the entire range of genome sizes. In addition, we examined the connections between the sequence evolution rate and other genomic features. Although gene order changes much faster than protein sequences during the evolution of prokaryotes, a strong positive correlation was observed between the “rearrangement distance” and the amino acid distance, suggesting that at least some of the events leading to genome rearrangement are subjected to the same type of selective constraints as the evolution of amino acid sequences.

Download Full-text

Marine RNA Virus Quasispecies Are Distributed throughout the Oceans

mSphere ◽

10.1128/mspheredirect.00157-19 ◽

2019 ◽

Vol 4 (2) ◽

Cited By ~ 8

Author(s):

Marli Vlok ◽

Andrew S. Lang ◽

Curtis A. Suttle

Keyword(s):

Amino Acid ◽

Rna Viruses ◽

Rna Virus ◽

Purifying Selection ◽

Amino Acid Sequences ◽

Metagenomic Data ◽

Spatial And Temporal Patterns ◽

Data Set ◽

Virus Diversity ◽

Virus Genomes

ABSTRACTRNA viruses, particularly genetically diverse members of thePicornavirales, are widespread and abundant in the ocean. Gene surveys suggest that there are spatial and temporal patterns in the composition of RNA virus assemblages, but data on their diversity and genetic variability in different oceanographic settings are limited. Here, we show that specific RNA virus genomes have widespread geographic distributions and that the dominant genotypes are under purifying selection. Genomes from three previously unknown picorna-like viruses (BC-1, -2, and -3) assembled from a coastal site in British Columbia, Canada, as well as marine RNA viruses JP-A, JP-B, andHeterosigma akashiwoRNA virus exhibited different biogeographical patterns. Thus, biotic factors such as host specificity and viral life cycle, and not just abiotic processes such as dispersal, affect marine RNA virus distribution. Sequence differences relative to reference genomes imply that virus quasispecies are under purifying selection, with synonymous single-nucleotide variations dominating in genomes from geographically distinct regions resulting in conservation of amino acid sequences. Conversely, sequences from coastal South Africa that mapped to marine RNA virus JP-A exhibited more nonsynonymous mutations, probably representing amino acid changes that accumulated over a longer separation. This biogeographical analysis of marine RNA viruses demonstrates that purifying selection is occurring across oceanographic provinces. These data add to the spectrum of known marine RNA virus genomes, show the importance of dispersal and purifying selection for these viruses, and indicate that closely related RNA viruses are pathogens of eukaryotic microbes across oceans.IMPORTANCEVery little is known about aquatic RNA virus populations and genome evolution. This is the first study that analyzes marine environmental RNA viral assemblages in an evolutionary and broad geographical context. This study contributes the largest marine RNA virus metagenomic data set to date, substantially increasing the sequencing space for RNA viruses and also providing a baseline for comparisons of marine RNA virus diversity. The new viruses discovered in this study are representative of the most abundant family of marine RNA viruses, theMarnaviridae, and expand our view of the diversity of this important group. Overall, our data and analyses provide a foundation for interpreting marine RNA virus diversity and evolution.

Download Full-text

Computational Analysis of Therapeutic Enzyme Uricase from Different Source Organisms

Current Proteomics ◽

10.2174/1570164616666190617165107 ◽

2020 ◽

Vol 17 (1) ◽

pp. 59-77

Author(s):

Anand Kumar Nelapati ◽

JagadeeshBabu PonnanEttiyappan

Keyword(s):

Uric Acid ◽

Amino Acid ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Protein Sequences ◽

Amino Acid Sequences ◽

Amino Acid Residues ◽

Multiple Sequence ◽

Physiochemical Properties ◽

Pharmaceutical Industries

Background:Hyperuricemia and gout are the conditions, which is a response of accumulation of uric acid in the blood and urine. Uric acid is the product of purine metabolic pathway in humans. Uricase is a therapeutic enzyme that can enzymatically reduces the concentration of uric acid in serum and urine into more a soluble allantoin. Uricases are widely available in several sources like bacteria, fungi, yeast, plants and animals.Objective:The present study is aimed at elucidating the structure and physiochemical properties of uricase by insilico analysis.Methods:A total number of sixty amino acid sequences of uricase belongs to different sources were obtained from NCBI and different analysis like Multiple Sequence Alignment (MSA), homology search, phylogenetic relation, motif search, domain architecture and physiochemical properties including pI, EC, Ai, Ii, and were performed.Results:Multiple sequence alignment of all the selected protein sequences has exhibited distinct difference between bacterial, fungal, plant and animal sources based on the position-specific existence of conserved amino acid residues. The maximum homology of all the selected protein sequences is between 51-388. In singular category, homology is between 16-337 for bacterial uricase, 14-339 for fungal uricase, 12-317 for plants uricase, and 37-361 for animals uricase. The phylogenetic tree constructed based on the amino acid sequences disclosed clusters indicating that uricase is from different source. The physiochemical features revealed that the uricase amino acid residues are in between 300- 338 with a molecular weight as 33-39kDa and theoretical pI ranging from 4.95-8.88. The amino acid composition results showed that valine amino acid has a high average frequency of 8.79 percentage compared to different amino acids in all analyzed species.Conclusion:In the area of bioinformatics field, this work might be informative and a stepping-stone to other researchers to get an idea about the physicochemical features, evolutionary history and structural motifs of uricase that can be widely used in biotechnological and pharmaceutical industries. Therefore, the proposed in silico analysis can be considered for protein engineering work, as well as for gout therapy.

Download Full-text

Techniques for the verification of minimal phylogenetic trees illustrated with ten mammalian haemoglobin sequences

Biochemical Journal ◽

10.1042/bj1870065 ◽

1980 ◽

Vol 187 (1) ◽

pp. 65-74 ◽

Cited By ~ 12

Author(s):

D Penny ◽

M D Hendy ◽

L R Foulds

Keyword(s):

Amino Acid ◽

Phylogenetic Tree ◽

Protein Sequence ◽

Phylogenetic Trees ◽

Sequence Data ◽

Protein Sequences ◽

Nucleotide Sequences ◽

Amino Acid Sequences ◽

Minimal Tree ◽

Protein Sequence Data

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.

Download Full-text

Unevolved De Novo Proteins Have Innate Tendencies to Bind Transition Metals

Life ◽

10.3390/life9010008 ◽

2019 ◽

Vol 9 (1) ◽

pp. 8 ◽

Cited By ~ 4

Author(s):

Michael S. Wang ◽

Kenric J. Hoegler ◽

Michael H. Hecht

Keyword(s):

Amino Acid ◽

Transition Metals ◽

Metal Binding ◽

Combinatorial Library ◽

De Novo ◽

Protein Sequences ◽

Amino Acid Sequences ◽

Ancestral Sequences ◽

Wide Range ◽

Catalytic Functions

Life as we know it would not exist without the ability of protein sequences to bind metal ions. Transition metals, in particular, play essential roles in a wide range of structural and catalytic functions. The ubiquitous occurrence of metalloproteins in all organisms leads one to ask whether metal binding is an evolved trait that occurred only rarely in ancestral sequences, or alternatively, whether it is an innate property of amino acid sequences, occurring frequently in unevolved sequence space. To address this question, we studied 52 proteins from a combinatorial library of novel sequences designed to fold into 4-helix bundles. Although these sequences were neither designed nor evolved to bind metals, the majority of them have innate tendencies to bind the transition metals copper, cobalt, and zinc with high nanomolar to low-micromolar affinity.

Download Full-text

BIOPEP-UWM Database of Bioactive Peptides: Current Opportunities

International Journal of Molecular Sciences ◽

10.3390/ijms20235978 ◽

2019 ◽

Vol 20 (23) ◽

pp. 5978 ◽

Cited By ~ 49

Author(s):

Minkiewicz ◽

Iwaniak ◽

Darewicz

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Chronic Diseases ◽

Bioactive Peptides ◽

Protein Sequences ◽

Batch Processing ◽

Amino Acid Sequences ◽

Quantitative Parameters ◽

New Information

The BIOPEP-UWM™ database of bioactive peptides (formerly BIOPEP) has recently become a popular tool in the research on bioactive peptides, especially on these derived from foods and being constituents of diets that prevent development of chronic diseases. The database is continuously updated and modified. The addition of new peptides and the introduction of new information about the existing ones (e.g., chemical codes and references to other databases) is in progress. New opportunities include the possibility of annotating peptides containing D-enantiomers of amino acids, batch processing option, converting amino acid sequences into SMILES code, new quantitative parameters characterizing the presence of bioactive fragments in protein sequences, and finding proteinases that release particular peptides.

Download Full-text

In silico analysis of virulence associated genes in genomes of Escherichia coli strains causing colibacillosis in poultry

Journal of Veterinary Research ◽

10.1515/jvetres-2017-0051 ◽

2017 ◽

Vol 61 (4) ◽

pp. 421-426 ◽

Cited By ~ 2

Author(s):

Joanna Kołsut ◽

Paulina Borówka ◽

Błażej Marciniak ◽

Ewelina Wójcik ◽

Arkadiusz Wojtasik ◽

...

Keyword(s):

Escherichia Coli ◽

Amino Acid ◽

Virulence Factors ◽

De Novo ◽

Protein Sequences ◽

In Silico Analysis ◽

Amino Acid Sequences ◽

Common Disease ◽

Bacterial Genomes ◽

E Coli

AbstractIntroduction: Colibacillosis – the most common disease of poultry, is caused mainly by avian pathogenic Escherichia coli (APEC). However, thus far, no pattern to the molecular basis of the pathogenicity of these bacteria has been established beyond dispute. In this study, genomes of APEC were investigated to ascribe importance and explore the distribution of 16 genes recognised as their virulence factors.Material and Methods: A total of 14 pathogenic for poultry E. coli strains were isolated, and their DNA was sequenced, assembled de novo, and annotated. Amino acid sequences from these bacteria and an additional 16 freely available APEC amino acid sequences were analysed with the DIFFIND tool to define their virulence factors.Results: The DIFFIND tool enabled quick, reliable, and convenient assessment of the differences between compared amino acid sequences from bacterial genomes. The presence of 16 protein sequences indicated as pathogenicity factors in poultry resulted in the generation of a heatmap which categorises genomes in terms of the existence and similarity of the analysed protein sequences.Conclusion: The proposed method of detection of virulence factors using the capabilities of the DIFFIND tool may be useful in the analysis of similarities of E. coli and other sequences deriving from bacteria. Phylogenetic analysis resulted in reliable segregation of 30 APEC strains into five main clusters containing various virulence associated genes (VAGs).

Download Full-text

Genetic Diversity in the 3′ Terminal 4.7-kb Region of Grapevine leafroll-associated virus 3

Phytopathology ◽

10.1094/phyto-07-10-0173 ◽

2011 ◽

Vol 101 (4) ◽

pp. 445-450 ◽

Cited By ~ 27

Author(s):

Jinbo Wang ◽

Abhineet M. Sharma ◽

Siobain Duffy ◽

Rodrigo P. P. Almeida

Keyword(s):

Genetic Diversity ◽

Purifying Selection ◽

Synonymous Substitution ◽

Effective Population ◽

Data Set ◽

Reading Frame ◽

Napa Valley ◽

Synonymous Substitution Rates ◽

Leafroll Disease

Grapevine leafroll-associated virus 3 (GLRaV-3; Ampelovirus, Closteroviridae), associated with grapevine leafroll disease, is an important pathogen found across all major grape-growing regions of the world. The genetic diversity of GLRaV-3 in Napa Valley, CA, was studied by sequencing 4.7 kb in the 3′ terminal region of 50 isolates obtained from Vitis vinifera ‘Merlot’. GLRaV-3 isolates were subdivided into four distinct phylogenetic clades. No evidence of positive selection was observed in the data set, although neutral selection (ratio of nonsynonymous to synonymous substitution rates = 1.1) was observed in one open reading frame (ORF 11, p4). Additionally, the four clades had variable degrees of overall nucleotide diversity. Moreover, no geographical structure among isolates was observed, and isolates belonging to different phylogenetic clades were found in distinct vineyards, with one exception. Considered with the evidence of purifying selection (i.e., against deleterious mutations), these data indicate that the population of GLRaV-3 in Napa Valley is not expanding and its effective population size is not increasing. Furthermore, research on the biological characterization of GLRaV-3 strains might provide valuable insights on the biology of this species that may have epidemiological relevance.

Download Full-text

Analisis Pohon Filogenik dari Protein Non-Struktural 1 (NS1) Virus Dengue di Kawasan Asia Tenggara

JURNAL Al-AZHAR INDONESIA SERI SAINS DAN TEKNOLOGI ◽

10.36722/sst.v1i2.28 ◽

2011 ◽

Vol 1 (2) ◽

pp. 69

Author(s):

Vanny Narita ◽

Asma Omar ◽

Agus Masduki

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Dengue Virus ◽

Protein Sequences ◽

Amino Acid Sequences ◽

South East Asia ◽

Phylogenetic Relation ◽

Phylogenetic Tree Analysis ◽

Tree Analysis ◽

Different Strains

Protein non-struktural 1 adalah protein Virus Dengue yang terkonservasi, tetapi protein non-struktural 1 dari Virus Dengue yang berbeda strain memiliki epitop berbeda yang dapat dikenali oleh sel-B. Epitop-epitop ini mungkin disusun oleh asam amino yang sama dalam urutan yang berbeda. Kemungkinan ini perlu dipertimbangkan dalam rangka memprediksi epitop sekuensial Virus Dengue. Tujuan penelitian kami adalah menganalisis hubungan kekerabatan dan susunan asam amino pada epitop spesifik yang telah dikonfirmasi dari sampel representatif gen protein NS1 dari Virus Dengue di kawasan Asia Tenggara. Hubungan kekerabatan protein non-struktural 1 dianalisis dengan perangkat lunak Lasergene®. Sekuen gen ditranslasi terlebih dahulu ke urutan asam amino, dan analisis pohon filogenetik kemudian dilakukan. Hasilnya menunjukkan bahwa hubungan kekerabatan protein non-struktural 1 berkisar antara 72-98%. Selanjutnya, epitop serospesifik dibandingkan berdasarkan hasil pengolahan data dnegan Lasergene. Perbandingan epitop serospesifik menunjukkan bahwa asam amino yang dominan dalam epitop adalah histidin, tirosin, glutamine dan serin<h6 style="text-align: center;"> Abstract</h6>Non-structural 1 protein is a conserved protein of dengue virus, but non-structural 1 proteins of dengue virus from different strains have different epitopes which can be recognized by B-cell. These epitopes may be constructed of similar amino acids in a different arrangement. This possibility must be considered in order to predict the sequencial epitope of dengue virus. The objective of our study was to analyze the phylogenetic relation and the arrangment of confirmed specific epitopes of dengue strains from representatives of South East Asia’s NS1 dengue gene samples. The phylogenetic relation of non-structural 1 protein sequences from South East Asia was analyzed with Lasergene® software. The gene sequences were translated to amino acid sequences, and phylogenetic tree analysis was performed. The results showed that the relatedness values among full sequences of non-structural 1 protein were 72-98%. Furthermore, the serospesific epitopes were compared according to the Lasergene results. The serospesific epitope comparation showed that the dominant amino acids in these epitopes were histidine, tyrosine, glutamine and serine.

Download Full-text

Proteomic analysis reveals developmentally expressed rice homologues of grass group II pollen allergens

Functional Plant Biology ◽

10.1071/fp03100 ◽

2003 ◽

Vol 30 (8) ◽

pp. 843 ◽

Cited By ~ 7

Author(s):

Tursun Kerim ◽

Nijat Imin ◽

Jeremy J. Weinman ◽

Barry G. Rolfe

Keyword(s):

Amino Acid ◽

Polyclonal Antibodies ◽

Protein Sequences ◽

Cross Reactivity ◽

Amino Acid Sequences ◽

Pollen Allergens ◽

Specific Expression ◽

Group I ◽

Group Ii ◽

Sequence Profiles

Three isoallergens of Ory s 2, homologues of grass group II pollen allergens, were identified from rice and characterised by proteome and immunochemical analyses. The N-terminal amino acid sequence profiles of three proteins on a 2-dimensional electrophoresis (2-DE) gel of rice pollen proteins matched 100% to the protein sequences encoded by three rice expressed sequence tags (ESTs). The deduced protein sequences from these ESTs share sequence identities of 41–43% with the protein sequences of the group II pollen allergens of different grasses, and sequence identity of 39% with the C-terminal portion of rice group I pollen allergens. Signal peptide sequences, which are similar to the leader peptides of other major pollen allergens, are also present in the deduced amino acid sequences. Polyclonal antibodies, produced in rabbits using Ory s 2 proteins purified by 2-DE, were used to investigate the developmental-stage- and tissue-specific expression of Ory s 2 by immunochemical analysis. Results of immunochemical experiments show that Ory s 2 proteins are expressed only at the late stage of pollen development and they do not have cross-reactivity with group II pollen allergens from some other common grasses.

Download Full-text

The occurrence in amino acid sequences of extensive informational symmetries based on possible codon-codon complementarity in the encoding polynucleotides

Biochemical Journal ◽

10.1042/bj1530681 ◽

1976 ◽

Vol 153 (3) ◽

pp. 681-690 ◽

Cited By ~ 2

Author(s):

G M Polya ◽

D R Phillips

Keyword(s):

Amino Acid ◽

Secondary Structure ◽

Amino Acid Sequence ◽

Basic Protein ◽

Protein A ◽

Protein Sequences ◽

Amino Acid Sequences ◽

Rat Skin ◽

Skin Collagen ◽

Low Probability

1. A procedure is described for the detection and assessment of informational complementarity in an amino acid sequence; it is based on possible autocomplementarity in the mRNA, and involves codon-to-codon matching. 2. This procedure was applied to myelin basic protein, a variety of protamines, histone IV, silk fibroin, rat skin collagen α1 chain and a sheep keratin. A multiplicity of extensive low-probability informational symmetries, based on codon-to-codon matching, were detected. 3. These low-probability orderings, which are independent of the actual mRNA codons, are rationalized in terms of the evolutionary ordering of the amino acid sequences concerned, in such a way that constraints on the secondary structure of the coding polynucleotides were satisfied. This possible interpretation is supported by a number of significant common properties of the protein sequences analysed.

Download Full-text