ESTIMATION OF TIME OF DIVERGENCE FROM PHYLOGENETIC STUDIES

Ranajit Chakraborty

doi:10.1139/g77-024

ESTIMATION OF TIME OF DIVERGENCE FROM PHYLOGENETIC STUDIES

Canadian Journal of Genetics and Cytology ◽

10.1139/g77-024 ◽

1977 ◽

Vol 19 (2) ◽

pp. 217-223 ◽

Cited By ~ 35

Author(s):

Ranajit Chakraborty

Keyword(s):

Amino Acid ◽

Sequence Data ◽

Amino Acid Sequences ◽

Evolutionary Significance ◽

Simultaneous Estimation ◽

Homologous Proteins ◽

Phylogenetic Structure ◽

Phylogenetic Studies ◽

Base Sequences ◽

Time Of Divergence

Recent studies with comparative data on base sequences of homologous DNAs or amino acid sequences of homologous proteins indicate that simultaneous estimation of phylogenetic structure and time of divergence is often cumbersome and time consuming. On the other hand, when the topology of an evolutionary tree is known, it is shown in this paper that the least squares theory may be applied to obtain simple estimates of the relative time lengths for each segment of the tree under the assumption of uniform random substitutions in each segment. The method is illustrated with amino acid sequence data on various globin molecules and cytochrome c. The evolutionary significance of some of the estimates is also discussed.

Download Full-text

Techniques for the verification of minimal phylogenetic trees illustrated with ten mammalian haemoglobin sequences

Biochemical Journal ◽

10.1042/bj1870065 ◽

1980 ◽

Vol 187 (1) ◽

pp. 65-74 ◽

Cited By ~ 12

Author(s):

D Penny ◽

M D Hendy ◽

L R Foulds

Keyword(s):

Amino Acid ◽

Phylogenetic Tree ◽

Protein Sequence ◽

Phylogenetic Trees ◽

Sequence Data ◽

Protein Sequences ◽

Nucleotide Sequences ◽

Amino Acid Sequences ◽

Minimal Tree ◽

Protein Sequence Data

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.

Download Full-text

Molecular characterization of the coat protein gene revealed considerable diversity of viral species complex in Garlic (Allium sativum L.)

10.1101/2020.12.03.409680 ◽

2020 ◽

Author(s):

Abel Debebe Mitiku ◽

Dawit Tesfaye Degefu ◽

Adane Abraham ◽

Desta Mejan ◽

Pauline Asami ◽

...

Keyword(s):

Amino Acid ◽

Coat Protein ◽

Species Complex ◽

Sequence Data ◽

Amino Acid Sequences ◽

Viral Gene ◽

Planting Material ◽

Virus C ◽

Garlic Virus ◽

Material Exchange

AbstractGarlic is one of the most crucial Allium vegetables used as seasoning of foods. It has a lot of benefits from the medicinal and nutritional point of view; however, its production is highly constrained by both biotic and abiotic challenges. Among these, viral infections are the most prevalent factors affecting crop productivity around the globe. This experiment was conducted on eleven selected garlic accessions and three improved varieties collected from different garlic growing agro-climatic regions of Ethiopia. This study aimed to identify and characterize the isolated garlic virus using the coat protein (CP) gene and further determine their phylogenetic relatedness. RNA was extracted from fresh young leaves, thirteen days old seedlings, which showed yellowing, mosaic, and stunting symptoms. Pairwise molecular diversity for CP nucleotide and amino acid sequences were calculated using MEGA5. Maximum Likelihood tree of CP nucleotide sequence data of Allexivirus and Potyvirus were conducted using PhyML, while a neighbor-joining tree was constructed for the amino acid sequence data using MEGA5. From the result, five garlic viruses were identified viz. Garlic virus C (78.6 %), Garlic virus D (64.3 %), Garlic virus X (78.6 %), Onion yellow dwarf virus (OYDV) (100%), and Leek yellow stripe virus (LYSV) (78.6 %). The study revealed the presence of complex mixtures of viruses with 42.9 % of the samples had co-infected with a species complex of Garlic virus C, Garlic virus D, Garlic virus X, OYDV, and LYSV. Pairwise comparisons of the isolated Potyviruses and Allexiviruses species revealed high identity with that of the known members of their respected species. As an exception, less within species identity was observed among Garlic virus C isolates as compared with that of the known members of the species. Finally, our results highlighted the need for stepping up a working framework to establish virus-free garlic planting material exchange in the country which could result in the reduction of viral gene flow across the country.Author SummaryGarlic viruses are the most devastating disease since garlic is the most vulnerable crop due to their vegetative nature of propagation. Currently, the garlic viruses are the aforementioned production constraint in Ethiopia. However, so far very little is known on the identification, diversity, and dissemination of garlic infecting viruses in the country. Here we explore the prevalence, genetic diversity, and the presence of mixed infection of garlic viruses in Ethiopia using next generation sequencing platform. Analysis of nucleotide and amino acid sequences of coat protein genes from infected samples revealed the association of three species from Allexivirus and two species from Potyvirus in a complex mixture. Ultimately the article concludes there is high time to set up a working framework to establish garlic free planting material exchange platform which could result in a reduction of viral gene flow across the country.

Download Full-text

FIND: Identifying Functionally and Structurally Important Features in Protein Sequences with Deep Neural Networks

10.1101/592808 ◽

2019 ◽

Author(s):

Ranjani Murali ◽

James Hemp ◽

Victoria Orphan ◽

Yonatan Bisk

Keyword(s):

Neural Networks ◽

Amino Acid ◽

Hidden Markov Models ◽

Markov Models ◽

Genomic Sequence ◽

Hidden Markov ◽

Amino Acid Sequences ◽

Homologous Proteins ◽

Biological Studies ◽

Insight Into

AbstractThe ability to correctly predict the functional role of proteins from their amino acid sequences would significantly advance biological studies at the molecular level by improving our ability to understand the biochemical capability of biological organisms from their genomic sequence. Existing methods that are geared towards protein function prediction or annotation mostly use alignment-based approaches and probabilistic models such as Hidden-Markov Models. In this work we introduce a deep learning architecture (FunctionIdentification withNeuralDescriptions orFIND) which performs protein annotation from primary sequence. The accuracy of our methods matches state of the art techniques, such as protein classifiers based on Hidden Markov Models. Further, our approach allows for model introspection via a neural attention mechanism, which weights parts of the amino acid sequence proportionally to their relevance for functional assignment. In this way, the attention weights automatically uncover structurally and functionally relevant features of the classified protein and find novel functional motifs in previously uncharacterized proteins. While this model is applicable to any database of proteins, we chose to apply this model to superfamilies of homologous proteins, with the aim of extracting features inherent to divergent protein families within a larger superfamily. This provided insight into the functional diversification of an enzyme superfamily and its adaptation to different physiological contexts. We tested our approach on three families (nitrogenases, cytochromebd-type oxygen reductases and heme-copper oxygen reductases) and present a detailed analysis of the sequence characteristics identified in previously characterized proteins in the heme-copper oxygen reductase (HCO) superfamily. These are correlated with their catalytic relevance and evolutionary history. FIND was then applied to discover features in previously uncharacterized members of the HCO superfamily, providing insight into their unique sequence features. This modeling approach demonstrates the power of neural networks to recognize patterns in large datasets and can be utilized to discover biochemically and structurally important features in proteins from their amino acid sequences.Author summary

Download Full-text

Nucleotide and derived amino acid sequences of a cDNA coding for pre-uteroglobin from the lung of the hare (Lepus capensis)

Biochemical Journal ◽

10.1042/bj2350895 ◽

1986 ◽

Vol 235 (3) ◽

pp. 895-898 ◽

Cited By ~ 12

Author(s):

M S López de Haro ◽

A Nieto

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Nucleotide Sequence ◽

Amino Acid Sequence ◽

Amino Acid Sequences ◽

Untranslated Regions ◽

Coding Region ◽

Homologous Proteins ◽

Lepus Capensis ◽

Rabbit Gene

An almost full-length cDNA coding for pre-uteroglobin from hare lung was cloned and sequenced. The derived amino acid sequence indicated that hare pre-uteroglobin contained 91 amino acids, including a signal peptide of 21 residues. Comparison of the nucleotide sequence of hare pre-uteroglobin cDNA with that previously reported for the rabbit gene indicated five silent point substitutions and six others leading to amino acid changes in the coding region. The untranslated regions of both pre-uteroglobin mRNAs were very similar. The amino acid changes observed are discussed in relation to the different progesterone-binding abilities of both homologous proteins.

Download Full-text

Alignment of Amino Acid and DNA Sequences of Human Proline-rich Proteins

Critical Reviews in Oral Biology & Medicine ◽

10.1177/10454411930040030501 ◽

1993 ◽

Vol 4 (3) ◽

pp. 287-292 ◽

Cited By ~ 12

Author(s):

D.L. Kauffman ◽

P.J. Keller ◽

A. Bennick ◽

M. Blum

Keyword(s):

Amino Acid ◽

Dna Sequences ◽

Sequence Data ◽

Gel Filtration ◽

Exchange Chromatography ◽

Amino Acid Sequences ◽

Secreted Proteins ◽

Dna Encoding ◽

Protein Amino Acid ◽

Primary Gene

Human proline-rich proteins (PRPs) constitute a complex family of salivary proteins that are encoded by a small number of genes. The primary gene product is cleaved by proteases, thereby giving rise to about 20 secreted proteins. To determine the genes for the secreted PRPs, therefore, it is necessary to obtain sequences of both the secreted proteins and the DNA encoding these proteins. We have sequenced most PRPs from one donor (D.K.) and aligned the protein sequences with available DNA sequences from unrelated individuals. Partial sequence data have now been obtained for an additional PRP from D.K. named II-1. This protein was purified from parotid saliva by gel filtration and ion-exchange chromatography. Peptides were obtained by cleavage with trypsin, clostripain, and N-bromosuccinimide, followed by column chromatography. The peptides were sequenced on a gas-phase protein sequenator. Overlapping peptide sequences were obtained for most of II-1 and aligned with translated DNA sequences. The best fit was obtained with clones containing sequences for the allele PRB4" (Lyons et al., 1988). However, there was not complete identity of the protein amino acid sequence and the DNA-derived sequences, indicating that II-1 is not encoded by PRB4". Other PRPs isolated from D.K. also fail to conform to any DNA structure so far reported. This shows the need to obtain amino acid sequences and corresponding DNA sequences from the same person to assign genes for the PRPs and to determine the location of the postribosomal cleavage points in the primary translation product.

Download Full-text

Phylogenomic Reconstruction Sheds Light on New Relationships and Timescale of Rails (Aves: Rallidae) Evolution

Diversity ◽

10.3390/d12020070 ◽

2020 ◽

Vol 12 (2) ◽

pp. 70 ◽

Cited By ~ 5

Author(s):

Juan C. Garcia-R ◽

Emily Moriarty Lemmon ◽

Alan R. Lemmon ◽

Nigel French

Keyword(s):

Evolutionary History ◽

State Of The Art ◽

Molecular Techniques ◽

Evolutionary Significance ◽

Phylogenetic Structure ◽

Bayesian Approaches ◽

Phylogenetic Studies ◽

Anchored Phylogenomics ◽

History Of ◽

Phylogenomic Analyses

The integration of state-of-the-art molecular techniques and analyses, together with a broad taxonomic sampling, can provide new insights into bird interrelationships and divergence. Despite their evolutionary significance, the relationships among several rail lineages remain unresolved as does the general timescale of rail evolution. Here, we disentangle the deep phylogenetic structure of rails using anchored phylogenomics. We analysed a set of 393 loci from 63 species, representing approximately 40% of the extant familial diversity. Our phylogenomic analyses reconstruct the phylogeny of rails and robustly infer several previously contentious relationships. Concatenated maximum likelihood and coalescent species-tree approaches recover identical topologies with strong node support. The results are concordant with previous phylogenetic studies using small DNA datasets, but they also supply an additional resolution. Our dating analysis provides contrasting divergence times using fossils and Bayesian and non-Bayesian approaches. Our study refines the evolutionary history of rails, offering a foundation for future evolutionary studies of birds.

Download Full-text

RHIVDB: A Freely Accessible Database of HIV Amino Acid Sequences and Clinical Data of Infected Patients

Frontiers in Genetics ◽

10.3389/fgene.2021.679029 ◽

2021 ◽

Vol 12 ◽

Author(s):

Olga Tarasova ◽

Anastasia Rudik ◽

Dmitry Kireev ◽

Vladimir Poroikov

Keyword(s):

Amino Acid ◽

Sequence Data ◽

Drug Combinations ◽

Biological Data ◽

Amino Acid Sequences ◽

Web Database ◽

Cd4 Cell Count ◽

Viral Load Data ◽

Treatment Data ◽

Cd4 Cell

Human immunodeficiency virus (HIV) infection remains one of the most severe problems for humanity, particularly due to the development of HIV resistance. To evaluate an association between viral sequence data and drug combinations and to estimate an effect of a particular drug combination on the treatment results, collection of the most representative drug combinations used to cure HIV and the biological data on amino acid sequences of HIV proteins is essential. We have created a new, freely available web database containing 1,651 amino acid sequences of HIV structural proteins [reverse transcriptase (RT), protease (PR), integrase (IN), and envelope protein (ENV)], treatment history information, and CD4+ cell count and viral load data available by the user’s query. Additionally, the biological data on new HIV sequences and treatment data can be stored in the database by any user followed by an expert’s verification. The database is available on the web at http://www.way2drug.com/rhivdb.

Download Full-text

Phylogenetic relationships of the nematode subfamily Phascolostrongylinae from macropodid and vombatid marsupials inferred using mitochondrial protein sequence data

Parasites & Vectors ◽

10.1186/s13071-021-05028-2 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Tanapan Sukee ◽

Ian Beveridge ◽

Anson V. Koehler ◽

Ross Hall ◽

Robin B. Gasser ◽

...

Keyword(s):

Amino Acid ◽

Phylogenetic Relationships ◽

Sequence Data ◽

Mitochondrial Protein ◽

Amino Acid Sequences ◽

Internal Transcribed Spacers ◽

Nuclear Ribosomal Dna ◽

Phylogenetic Position ◽

Data Sets ◽

Sister Relationship

Abstract Background The subfamily Phascolostrongylinae (Superfamily Strongyloidea) comprises nematodes that are parasitic in the gastrointestinal tracts of macropodid (Family Macropodidae) and vombatid (Family Vombatidae) marsupials. Currently, nine genera and 20 species have been attributed to the subfamily Phascolostrongylinae. Previous studies using sequence data sets for the internal transcribed spacers (ITS) of nuclear ribosomal DNA showed conflicting topologies between the Phascolostrongylinae and related subfamilies. Therefore, the aim of this study was to validate the phylogenetic relationships within the Phascolostrongylinae and its relationship with the families Chabertiidae and Strongylidae using mitochondrial amino acid sequences. Methods The sequences of all 12 mitochondrial protein-coding genes were obtained by next-generation sequencing of individual adult nematodes (n = 8) representing members of the Phascolostrongylinae. These sequences were conceptually translated and the phylogenetic relationships within the Phascolostrongylinae and its relationship with the families Chabertiidae and Strongylidae were inferred from aligned, concatenated amino acid sequence data sets. Results Within the Phascolostrongylinae, the wombat-specific genera grouped separately from the genera occurring in macropods. Two of the phascolostrongyline tribes were monophyletic, including Phascolostrongylinea and Hypodontinea, whereas the tribe Macropostrongyloidinea was paraphyletic. The tribe Phascolostrongylinea occurring in wombats was closely related to Oesophagostomum spp., also from the family Chabertiidae, which formed a sister relationship with the Phascolostrongylinae. Conclusion The current phylogenetic relationship within the subfamily Phascolostrongylinae supports findings from a previous study based on ITS sequence data. This study contributes also to the understanding of the phylogenetic position of the subfamily Phascolostrongylinae within the Chabertiidae. Future studies investigating the relationships between the Phascolostrongylinae and Cloacininae from macropodid marsupials may advance our knowledge of the phylogeny of strongyloid nematodes in marsupials. Graphical Abstract

Download Full-text

SignalP 6.0 predicts all five types of signal peptides using protein language models

Nature Biotechnology ◽

10.1038/s41587-021-01156-3 ◽

2022 ◽

Author(s):

Felix Teufel ◽

José Juan Almagro Armenteros ◽

Alexander Rosenberg Johansen ◽

Magnús Halldór Gíslason ◽

Silas Irby Pihl ◽

...

Keyword(s):

Machine Learning ◽

Amino Acid ◽

Sequence Data ◽

Amino Acid Sequences ◽

Language Models ◽

Metagenomic Data ◽

Signal Peptides ◽

Machine Learning Model ◽

Living Organisms ◽

Control Protein

AbstractSignal peptides (SPs) are short amino acid sequences that control protein secretion and translocation in all living organisms. SPs can be predicted from sequence data, but existing algorithms are unable to detect all known types of SPs. We introduce SignalP 6.0, a machine learning model that detects all five SP types and is applicable to metagenomic data.

Download Full-text

arfI and arfII, Two Genes Encoding α-l-Arabinofuranosidases inCytophaga xylanolytica

Applied and Environmental Microbiology ◽

10.1128/aem.64.5.1919-1923.1998 ◽

1998 ◽

Vol 64 (5) ◽

pp. 1919-1923 ◽

Cited By ~ 14

Author(s):

Kwi S. Kim ◽

Timothy G. Lilburn ◽

Michael J. Renner ◽

John A. Breznak

Keyword(s):

Bacillus Subtilis ◽

Amino Acid ◽

Amino Acid Sequences ◽

Homologous Proteins ◽

Genes Encoding ◽

Bacteroides Ovatus

ABSTRACT arfI encoded the 57.7-kDa subunit of Cytophaga xylanolytica arabinofuranosidase I (ArfI). arfIIencoded a 59.2-kDa subunit of ArfII. Products of both cloned genes liberated arabinose from arabinan and arabinoxylan. The deduced amino acid sequences of ArfI and ArfII revealed numerous regions that were identical to each other and to regions of homologous proteins fromBacteroides ovatus, Bacillus subtilis, andClostridium stercorarium.

Download Full-text