scholarly journals A Genome Tree Made from Information on Protein Domain Organizations

2008 ◽  
Vol 48 (4) ◽  
pp. 243-245
Author(s):  
Kaoru FUKAMI-KOBAYASHI ◽  
Ken NISHIKAWA
Keyword(s):  
A Genome ◽  
2002 ◽  
Vol 184 (8) ◽  
pp. 2072-2080 ◽  
Author(s):  
G. D. Paul Clarke ◽  
Robert G. Beiko ◽  
Mark A. Ragan ◽  
Robert L. Charlebois

ABSTRACT Darwin's paradigm holds that the diversity of present-day organisms has arisen via a process of genetic descent with modification, as on a bifurcating tree. Evidence is accumulating that genes are sometimes transferred not along lineages but rather across lineages. To the extent that this is so, Darwin's paradigm can apply only imperfectly to genomes, potentially complicating or perhaps undermining attempts to reconstruct historical relationships among genomes (i.e., a genome tree). Whether most genes in a genome have arisen via treelike (vertical) descent or by lateral transfer across lineages can be tested if enough complete genome sequences are used. We define a phylogenetically discordant sequence (PDS) as an open reading frame (ORF) that exhibits patterns of similarity relationships statistically distinguishable from those of most other ORFs in the same genome. PDSs represent between 6.0 and 16.8% (mean, 10.8%) of the analyzable ORFs in the genomes of 28 bacteria, eight archaea, and one eukaryote (Saccharomyces cerevisiae). In this study we developed and assessed a distance-based approach, based on mean pairwise sequence similarity, for generating genome trees. Exclusion of PDSs improved bootstrap support for basal nodes but altered few topological features, indicating that there is little systematic bias among PDSs. Many but not all features of the genome tree from which PDSs were excluded are consistent with the 16S rRNA tree.


Author(s):  
Lu Jin ◽  
Jia-Jia Liu ◽  
Tian-Wen Xiao ◽  
Qiao-Ming Li ◽  
Luxiang Lin ◽  
...  

Phylogenetic trees have been extensively used in community ecology. However, how the phylogenetic reconstruction affects ecological inferences is poorly understood. In this study, we reconstructed three different types of phylogenetic trees (a synthetic-tree generated using VPhylomaker, a barcode-tree generated using rbcL+matK+trnH-psbA and a genome-tree generated from plastid genomes) that represented an increasing level of phylogenetic resolution among 580 woody plant species from six dynamic plots in subtropical evergreen broadleaved forests of China. We then evaluated the performance of each phylogeny in estimations of community phylogenetic structure, turnover and phylogenetic signal in functional traits. As expected, the genome-tree was most resolved and most supported for relationships among species. For local phylogenetic structure, the three trees showed consistent results with Faith’s PD and MPD; however, only the synthetic-tree produced significant clustering patterns using MNTD for some plots. For phylogenetic turnover, contrasting results between the molecular trees and the synthetic-tree occurred only with nearest neighbor distance. The barcode-tree agreed more with the genome-tree than the synthetic-tree for both phylogenetic structure and turnover. For functional traits, both the barcode-tree and genome-tree detected phylogenetic signal in maximum height, but only the genome-tree detected signal in leaf width. This is the first study that uses plastid genomes in large-scale community phylogenetics. Our results highlight the outperformance of genome-trees over barcode-trees and synthetic-trees for the analyses studied here. Our results also point to the possibility of Type I and II errors in estimation of phylogenetic structure and turnover and detection of phylogenetic signal when using synthetic-trees.


2019 ◽  
Author(s):  
JaeJin Choi ◽  
Sung-Hou Kim

AbstractAn organism Tree of Life (organism ToL) is a conceptual and metaphorical tree to capture a simplified narrative of the evolutionary course and kinship among the extant organisms of today. Such tree cannot be experimentally validated but may be reconstructed based on characteristics associated with the extant organisms. Since the whole genome sequence of an organism is, at present, the most comprehensive descriptor of the organism, a genome Tol can be an empirically derivable surrogate for the organism ToL. However, a genome ToL has been impossible to construct because of the practical reasons that experimentally determining the whole genome sequences of a large number of diverse organisms was technically impossible. Thus, for several decades, gene ToLs, based on selected genes, have been commonly used as a surrogate for the organisms ToL. This situation changed dramatically during the last several decades due to rapid advances in DNA sequencing technology. Here we describe the main features of a genome ToL that are different from those of the broadly accepted gene ToLs: (a) the first two organism groups to emerge are the founders of prokarya and eukarya, (b) they diversify into six large groups and all the founders of the groups have emerged in a “Deep Burst” at the very beginning period of the emergence of Life on Earth and (c) other differences are notable in the order of emergence of smaller groups.Significance StatementTree of Life is a conceptual and metaphorical tree that captures a simplified narrative of the evolutionary course and kinship among all living organisms of today. Since the whole genome sequence information of an organism is, at present, the most comprehensive description of the organism, we reconstructed a Genome Tree of Life using the proteome information from the whole genomes of over 4000 different living organisms on Earth. It suggests that (a) the first two primitive organism groups to emerge are the founders of prokarya and eukarya, (b) they diversify into six large groups, and (c) all the founders of the groups have emerged in a “Deep Burst” at the very beginning period of the emergence of Life on Earth.


2017 ◽  
Vol 114 (35) ◽  
pp. 9391-9396 ◽  
Author(s):  
JaeJin Choi ◽  
Sung-Hou Kim

Fungi belong to one of the largest and most diverse kingdoms of living organisms. The evolutionary kinship within a fungal population has so far been inferred mostly from the gene-information–based trees (“gene trees”), constructed commonly based on the degree of differences of proteins or DNA sequences of a small number of highly conserved genes common among the population by a multiple sequence alignment (MSA) method. Since each gene evolves under different evolutionary pressure and time scale, it has been known that one gene tree for a population may differ from other gene trees for the same population depending on the subjective selection of the genes. Within the last decade, a large number of whole-genome sequences of fungi have become publicly available, which represent, at present, the most fundamental and complete information about each fungal organism. This presents an opportunity to infer kinship among fungi using a whole-genome information-based tree (“genome tree”). The method we used allows comparison of whole-genome information without MSA, and is a variation of a computational algorithm developed to find semantic similarities or plagiarism in two books, where we represent whole-genomic information of an organism as a book of words without spaces. The genome tree reveals several significant and notable differences from the gene trees, and these differences invoke new discussions about alternative narratives for the evolution of some of the currently accepted fungal groups.


2017 ◽  
Author(s):  
Meru J. Sadhu ◽  
Joshua S. Bloom ◽  
Laura Day ◽  
Jake J. Siegel ◽  
Sriram Kosuri ◽  
...  

AbstractDirect measurement of functional effects of DNA sequence variants throughout a genome is a major challenge. We developed a method that uses CRISPR/Cas9 to engineer many specific variants of interest in parallel in the budding yeast Saccharomyces cerevisiae, and to screen them for functional effects. We used the method to examine the functional consequences of premature termination codons (PTCs) at different locations within all annotated essential genes in yeast. We found that most PTCs were highly deleterious unless they occurred close to the C-terminal end and did not interrupt an annotated protein domain. Surprisingly, we discovered that some putatively essential genes are dispensable, while others have large dispensable regions. This approach can be used to profile the effects of large classes of variants in a high-throughput manner.


Author(s):  
Poornima Ramesh ◽  
Jayashree Honnebailu Nagendrappa ◽  
Santosh Kumar Hulikal Shivashankara

Abstract Background Drug target identification is a fast-growing field of research in many human diseases. Many strategies have been devised in the post-genomic era to identify new drug targets for infectious diseases. Analysis of protein sequences from different organisms often reveals cases of exon/ORF shuffling in a genome. This results in the fusion of proteins/domains, either in the same genome or that of some other organism, and is termed Rosetta stone sequences. They help link disparate proteins together describing local and global relationships among proteomes. The functional role of proteins is determined mainly by domain-domain interactions and leading to the corresponding signaling mechanism. Putative proteins can be identified as drug targets by re-annotating their functional role through domain-based strategies. Results This study has utilized a bioinformatics approach to identify the putative proteins that are ideal drug targets for pneumonia infection by re-annotating the proteins through position-specific iterations. The putative proteome of two pneumonia-causing pathogens was analyzed to identify protein domain abundance and versatility among them. Common domains found in both pathogens were identified, and putative proteins containing these domains were re-annotated. Among many druggable protein targets, the re-annotation of EJJ83173 (which contains the GFO_IDH_MocA domain) showed that its probable function is glucose-fructose oxidoreduction. This protein was found to have sufficient interactor proteins and homolog in both pathogens but no homolog in the host (human), indicating it as an ideal drug target. 3D modeling of the protein showed promising model parameters. The model was utilized for virtual screening which revealed several ligands with inhibitory activity. These ligands included molecules documented in traditional Chinese medicine and currently marketed drugs. Conclusions This novel strategy of drug target identification through domain-based putative protein re-annotation presents a prospect to validate the proposed drug target to confer its utility as a typical protein targeting both pneumonia-causing species studied herewith.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6765 ◽  
Author(s):  
Shuangcheng Ding ◽  
Xin Feng ◽  
Hewei Du ◽  
Hongwei Wang

Background Worldwide cultivation of maize is often impacted negatively by drought stress. Hyperosmolality-gated calcium-permeable channels (OSCA) have been characterized as osmosensors in Arabidopsis. However, the involvement of members of the maize OSCA (ZmOSCA) gene family in response to drought stress is unknown. It is furthermore unclear which ZmOSCA gene plays a major role in genetic improvement of drought tolerance in Maize. Methods We predicted the protein domain structure and transmembrane regions by using the NCBI Conserved Domain Database database and TMHMM server separately. The phylogeny tree was built by Mega7. We used the mixed linear model in TASSEL to perform the family-based association analysis. Results In this report, 12 ZmOSCA genes were uncovered in the maize genome by a genome-wide survey and analyzed systematically to reveal their synteny and phylogenetic relationship with the genomes of rice, maize, and sorghum. These analyses indicated a relatively conserved evolutionary history of the ZmOSCA gene family. Protein domain and transmembrane analysis indicated that most of the 12 ZmOSCAs shared similar structures with their homologs. The result of differential expression analysis under drought at various stages, as well as the expression profiles in 15 tissues, revealed a functional divergence of ZmOSCA genes. Notably, the expression level of ZmOSCA4.1 being up-regulated in both seedlings and adult leaves. Notably, the association analysis between genetic variations in these genes and drought tolerance was detected. Significant associations between genetic variation in ZmOSCA4.1 and drought tolerance were found at the seedling stage. Our report provides a detailed analysis of the ZmOSCAs in the maize genome. These findings will contribute to future studies on the functional characterization of ZmOSCA proteins in response to water deficit stress, as well as understanding the mechanism of genetic variation in drought tolerance in maize.


2020 ◽  
Vol 10 (12) ◽  
pp. 4489-4503
Author(s):  
Ching-Ting Huang ◽  
Kathy Esvelt Klos ◽  
Yung-Fen Huang

Seed vigor is crucial for crop early establishment in the field and is particularly important for forage crop production. Oat (Avena sativa L.) is a nutritious food crop and also a valuable forage crop. However, little is known about the genetics of seed vigor in oats. To investigate seed vigor-related traits and their genetic architecture in oats, we developed an easy-to-implement image-based phenotyping pipeline and applied it to 650 elite oat lines from the Collaborative Oat Research Enterprise (CORE). Root number, root surface area, and shoot length were measured in two replicates. Variables such as growth rate were derived. Using a genome-wide association (GWA) approach, we identified 34 and 16 unique loci associated with root traits and shoot traits, respectively, which corresponded to 41 and 16 unique SNPs at a false discovery rate < 0.1. Nine root-associated loci were organized into four sets of homeologous regions, while nine shoot-associated loci were organized into three sets of homeologous regions. The context sequences of five trait-associated markers matched to the sequences of rice, Brachypodium and maize (E-value < 10−10), including three markers matched to known gene models with potential involvement in seed vigor. These were a glucuronosyltransferase, a mitochondrial carrier protein domain containing protein, and an iron-sulfur cluster protein. This study presents the first GWA study on oat seed vigor and data of this study can provide guidelines and foundation for further investigations.


2017 ◽  
Vol 107 (1) ◽  
pp. 18-28 ◽  
Author(s):  
Boris A. Vinatzer ◽  
Alexandra J. Weisberg ◽  
Caroline L. Monteil ◽  
Haitham A. Elmarakeby ◽  
Samuel K. Sheppard ◽  
...  

Taxonomy of plant pathogenic bacteria is challenging because pathogens of different crops often belong to the same named species but current taxonomy does not provide names for bacteria below the subspecies level. The introduction of the host range-based pathovar system in the 1980s provided a temporary solution to this problem but has many limitations. The affordability of genome sequencing now provides the opportunity for developing a new genome-based taxonomic framework. We already proposed to name individual bacterial isolates based on pairwise genome similarity. Here, we expand on this idea and propose to use genome similarity-based codes, which we now call life identification numbers (LINs), to describe and name bacterial taxa. Using 93 genomes of Pseudomonas syringae sensu lato, LINs were compared with a P. syringae genome tree whereby the assigned LINs were found to be informative of a majority of phylogenetic relationships. LINs also reflected host range and outbreak association for strains of P. syringae pathovar actinidiae, a pathovar for which many genome sequences are available. We conclude that LINs could provide the basis for a new taxonomic framework to address the shortcomings of the current pathovar system and to complement the current taxonomic system of bacteria in general.


2018 ◽  
Vol 35 (2) ◽  
pp. 177-180
Author(s):  
Mari Narusaka ◽  
Harunobu Yunokawa ◽  
Yoshihiro Narusaka
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document