ESTIMATION OF TIME OF DIVERGENCE FROM PHYLOGENETIC STUDIES

1977 ◽  
Vol 19 (2) ◽  
pp. 217-223 ◽  
Author(s):  
Ranajit Chakraborty

Recent studies with comparative data on base sequences of homologous DNAs or amino acid sequences of homologous proteins indicate that simultaneous estimation of phylogenetic structure and time of divergence is often cumbersome and time consuming. On the other hand, when the topology of an evolutionary tree is known, it is shown in this paper that the least squares theory may be applied to obtain simple estimates of the relative time lengths for each segment of the tree under the assumption of uniform random substitutions in each segment. The method is illustrated with amino acid sequence data on various globin molecules and cytochrome c. The evolutionary significance of some of the estimates is also discussed.

1980 ◽  
Vol 187 (1) ◽  
pp. 65-74 ◽  
Author(s):  
D Penny ◽  
M D Hendy ◽  
L R Foulds

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.


2020 ◽  
Author(s):  
Abel Debebe Mitiku ◽  
Dawit Tesfaye Degefu ◽  
Adane Abraham ◽  
Desta Mejan ◽  
Pauline Asami ◽  
...  

AbstractGarlic is one of the most crucial Allium vegetables used as seasoning of foods. It has a lot of benefits from the medicinal and nutritional point of view; however, its production is highly constrained by both biotic and abiotic challenges. Among these, viral infections are the most prevalent factors affecting crop productivity around the globe. This experiment was conducted on eleven selected garlic accessions and three improved varieties collected from different garlic growing agro-climatic regions of Ethiopia. This study aimed to identify and characterize the isolated garlic virus using the coat protein (CP) gene and further determine their phylogenetic relatedness. RNA was extracted from fresh young leaves, thirteen days old seedlings, which showed yellowing, mosaic, and stunting symptoms. Pairwise molecular diversity for CP nucleotide and amino acid sequences were calculated using MEGA5. Maximum Likelihood tree of CP nucleotide sequence data of Allexivirus and Potyvirus were conducted using PhyML, while a neighbor-joining tree was constructed for the amino acid sequence data using MEGA5. From the result, five garlic viruses were identified viz. Garlic virus C (78.6 %), Garlic virus D (64.3 %), Garlic virus X (78.6 %), Onion yellow dwarf virus (OYDV) (100%), and Leek yellow stripe virus (LYSV) (78.6 %). The study revealed the presence of complex mixtures of viruses with 42.9 % of the samples had co-infected with a species complex of Garlic virus C, Garlic virus D, Garlic virus X, OYDV, and LYSV. Pairwise comparisons of the isolated Potyviruses and Allexiviruses species revealed high identity with that of the known members of their respected species. As an exception, less within species identity was observed among Garlic virus C isolates as compared with that of the known members of the species. Finally, our results highlighted the need for stepping up a working framework to establish virus-free garlic planting material exchange in the country which could result in the reduction of viral gene flow across the country.Author SummaryGarlic viruses are the most devastating disease since garlic is the most vulnerable crop due to their vegetative nature of propagation. Currently, the garlic viruses are the aforementioned production constraint in Ethiopia. However, so far very little is known on the identification, diversity, and dissemination of garlic infecting viruses in the country. Here we explore the prevalence, genetic diversity, and the presence of mixed infection of garlic viruses in Ethiopia using next generation sequencing platform. Analysis of nucleotide and amino acid sequences of coat protein genes from infected samples revealed the association of three species from Allexivirus and two species from Potyvirus in a complex mixture. Ultimately the article concludes there is high time to set up a working framework to establish garlic free planting material exchange platform which could result in a reduction of viral gene flow across the country.


2019 ◽  
Author(s):  
Ranjani Murali ◽  
James Hemp ◽  
Victoria Orphan ◽  
Yonatan Bisk

AbstractThe ability to correctly predict the functional role of proteins from their amino acid sequences would significantly advance biological studies at the molecular level by improving our ability to understand the biochemical capability of biological organisms from their genomic sequence. Existing methods that are geared towards protein function prediction or annotation mostly use alignment-based approaches and probabilistic models such as Hidden-Markov Models. In this work we introduce a deep learning architecture (FunctionIdentification withNeuralDescriptions orFIND) which performs protein annotation from primary sequence. The accuracy of our methods matches state of the art techniques, such as protein classifiers based on Hidden Markov Models. Further, our approach allows for model introspection via a neural attention mechanism, which weights parts of the amino acid sequence proportionally to their relevance for functional assignment. In this way, the attention weights automatically uncover structurally and functionally relevant features of the classified protein and find novel functional motifs in previously uncharacterized proteins. While this model is applicable to any database of proteins, we chose to apply this model to superfamilies of homologous proteins, with the aim of extracting features inherent to divergent protein families within a larger superfamily. This provided insight into the functional diversification of an enzyme superfamily and its adaptation to different physiological contexts. We tested our approach on three families (nitrogenases, cytochromebd-type oxygen reductases and heme-copper oxygen reductases) and present a detailed analysis of the sequence characteristics identified in previously characterized proteins in the heme-copper oxygen reductase (HCO) superfamily. These are correlated with their catalytic relevance and evolutionary history. FIND was then applied to discover features in previously uncharacterized members of the HCO superfamily, providing insight into their unique sequence features. This modeling approach demonstrates the power of neural networks to recognize patterns in large datasets and can be utilized to discover biochemically and structurally important features in proteins from their amino acid sequences.Author summary


1986 ◽  
Vol 235 (3) ◽  
pp. 895-898 ◽  
Author(s):  
M S López de Haro ◽  
A Nieto

An almost full-length cDNA coding for pre-uteroglobin from hare lung was cloned and sequenced. The derived amino acid sequence indicated that hare pre-uteroglobin contained 91 amino acids, including a signal peptide of 21 residues. Comparison of the nucleotide sequence of hare pre-uteroglobin cDNA with that previously reported for the rabbit gene indicated five silent point substitutions and six others leading to amino acid changes in the coding region. The untranslated regions of both pre-uteroglobin mRNAs were very similar. The amino acid changes observed are discussed in relation to the different progesterone-binding abilities of both homologous proteins.


1993 ◽  
Vol 4 (3) ◽  
pp. 287-292 ◽  
Author(s):  
D.L. Kauffman ◽  
P.J. Keller ◽  
A. Bennick ◽  
M. Blum

Human proline-rich proteins (PRPs) constitute a complex family of salivary proteins that are encoded by a small number of genes. The primary gene product is cleaved by proteases, thereby giving rise to about 20 secreted proteins. To determine the genes for the secreted PRPs, therefore, it is necessary to obtain sequences of both the secreted proteins and the DNA encoding these proteins. We have sequenced most PRPs from one donor (D.K.) and aligned the protein sequences with available DNA sequences from unrelated individuals. Partial sequence data have now been obtained for an additional PRP from D.K. named II-1. This protein was purified from parotid saliva by gel filtration and ion-exchange chromatography. Peptides were obtained by cleavage with trypsin, clostripain, and N-bromosuccinimide, followed by column chromatography. The peptides were sequenced on a gas-phase protein sequenator. Overlapping peptide sequences were obtained for most of II-1 and aligned with translated DNA sequences. The best fit was obtained with clones containing sequences for the allele PRB4" (Lyons et al., 1988). However, there was not complete identity of the protein amino acid sequence and the DNA-derived sequences, indicating that II-1 is not encoded by PRB4". Other PRPs isolated from D.K. also fail to conform to any DNA structure so far reported. This shows the need to obtain amino acid sequences and corresponding DNA sequences from the same person to assign genes for the PRPs and to determine the location of the postribosomal cleavage points in the primary translation product.


Diversity ◽  
2020 ◽  
Vol 12 (2) ◽  
pp. 70 ◽  
Author(s):  
Juan C. Garcia-R ◽  
Emily Moriarty Lemmon ◽  
Alan R. Lemmon ◽  
Nigel French

The integration of state-of-the-art molecular techniques and analyses, together with a broad taxonomic sampling, can provide new insights into bird interrelationships and divergence. Despite their evolutionary significance, the relationships among several rail lineages remain unresolved as does the general timescale of rail evolution. Here, we disentangle the deep phylogenetic structure of rails using anchored phylogenomics. We analysed a set of 393 loci from 63 species, representing approximately 40% of the extant familial diversity. Our phylogenomic analyses reconstruct the phylogeny of rails and robustly infer several previously contentious relationships. Concatenated maximum likelihood and coalescent species-tree approaches recover identical topologies with strong node support. The results are concordant with previous phylogenetic studies using small DNA datasets, but they also supply an additional resolution. Our dating analysis provides contrasting divergence times using fossils and Bayesian and non-Bayesian approaches. Our study refines the evolutionary history of rails, offering a foundation for future evolutionary studies of birds.


2021 ◽  
Vol 12 ◽  
Author(s):  
Olga Tarasova ◽  
Anastasia Rudik ◽  
Dmitry Kireev ◽  
Vladimir Poroikov

Human immunodeficiency virus (HIV) infection remains one of the most severe problems for humanity, particularly due to the development of HIV resistance. To evaluate an association between viral sequence data and drug combinations and to estimate an effect of a particular drug combination on the treatment results, collection of the most representative drug combinations used to cure HIV and the biological data on amino acid sequences of HIV proteins is essential. We have created a new, freely available web database containing 1,651 amino acid sequences of HIV structural proteins [reverse transcriptase (RT), protease (PR), integrase (IN), and envelope protein (ENV)], treatment history information, and CD4+ cell count and viral load data available by the user’s query. Additionally, the biological data on new HIV sequences and treatment data can be stored in the database by any user followed by an expert’s verification. The database is available on the web at http://www.way2drug.com/rhivdb.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Tanapan Sukee ◽  
Ian Beveridge ◽  
Anson V. Koehler ◽  
Ross Hall ◽  
Robin B. Gasser ◽  
...  

Abstract Background The subfamily Phascolostrongylinae (Superfamily Strongyloidea) comprises nematodes that are parasitic in the gastrointestinal tracts of macropodid (Family Macropodidae) and vombatid (Family Vombatidae) marsupials. Currently, nine genera and 20 species have been attributed to the subfamily Phascolostrongylinae. Previous studies using sequence data sets for the internal transcribed spacers (ITS) of nuclear ribosomal DNA showed conflicting topologies between the Phascolostrongylinae and related subfamilies. Therefore, the aim of this study was to validate the phylogenetic relationships within the Phascolostrongylinae and its relationship with the families Chabertiidae and Strongylidae using mitochondrial amino acid sequences. Methods The sequences of all 12 mitochondrial protein-coding genes were obtained by next-generation sequencing of individual adult nematodes (n = 8) representing members of the Phascolostrongylinae. These sequences were conceptually translated and the phylogenetic relationships within the Phascolostrongylinae and its relationship with the families Chabertiidae and Strongylidae were inferred from aligned, concatenated amino acid sequence data sets. Results Within the Phascolostrongylinae, the wombat-specific genera grouped separately from the genera occurring in macropods. Two of the phascolostrongyline tribes were monophyletic, including Phascolostrongylinea and Hypodontinea, whereas the tribe Macropostrongyloidinea was paraphyletic. The tribe Phascolostrongylinea occurring in wombats was closely related to Oesophagostomum spp., also from the family Chabertiidae, which formed a sister relationship with the Phascolostrongylinae. Conclusion The current phylogenetic relationship within the subfamily Phascolostrongylinae supports findings from a previous study based on ITS sequence data. This study contributes also to the understanding of the phylogenetic position of the subfamily Phascolostrongylinae within the Chabertiidae. Future studies investigating the relationships between the Phascolostrongylinae and Cloacininae from macropodid marsupials may advance our knowledge of the phylogeny of strongyloid nematodes in marsupials. Graphical Abstract


Author(s):  
Felix Teufel ◽  
José Juan Almagro Armenteros ◽  
Alexander Rosenberg Johansen ◽  
Magnús Halldór Gíslason ◽  
Silas Irby Pihl ◽  
...  

AbstractSignal peptides (SPs) are short amino acid sequences that control protein secretion and translocation in all living organisms. SPs can be predicted from sequence data, but existing algorithms are unable to detect all known types of SPs. We introduce SignalP 6.0, a machine learning model that detects all five SP types and is applicable to metagenomic data.


1998 ◽  
Vol 64 (5) ◽  
pp. 1919-1923 ◽  
Author(s):  
Kwi S. Kim ◽  
Timothy G. Lilburn ◽  
Michael J. Renner ◽  
John A. Breznak

ABSTRACT arfI encoded the 57.7-kDa subunit of Cytophaga xylanolytica arabinofuranosidase I (ArfI). arfIIencoded a 59.2-kDa subunit of ArfII. Products of both cloned genes liberated arabinose from arabinan and arabinoxylan. The deduced amino acid sequences of ArfI and ArfII revealed numerous regions that were identical to each other and to regions of homologous proteins fromBacteroides ovatus, Bacillus subtilis, andClostridium stercorarium.


Sign in / Sign up

Export Citation Format

Share Document