A Genome Tree Made from Information on Protein Domain Organizations

Kaoru FUKAMI-KOBAYASHI; Ken NISHIKAWA

doi:10.2142/biophys.48.243

Inferring Genome Trees by Using a Filter To Eliminate Phylogenetically Discordant Sequences and a Distance Matrix Based on Mean Normalized BLASTP Scores

Journal of Bacteriology ◽

10.1128/jb.184.8.2072-2080.2002 ◽

2002 ◽

Vol 184 (8) ◽

pp. 2072-2080 ◽

Cited By ~ 75

Author(s):

G. D. Paul Clarke ◽

Robert G. Beiko ◽

Mark A. Ragan ◽

Robert L. Charlebois

Keyword(s):

Sequence Similarity ◽

Distance Matrix ◽

Bootstrap Support ◽

Systematic Bias ◽

Reading Frame ◽

Topological Features ◽

A Genome ◽

Pairwise Sequence Similarity ◽

Similarity Relationships ◽

Genome Tree

ABSTRACT Darwin's paradigm holds that the diversity of present-day organisms has arisen via a process of genetic descent with modification, as on a bifurcating tree. Evidence is accumulating that genes are sometimes transferred not along lineages but rather across lineages. To the extent that this is so, Darwin's paradigm can apply only imperfectly to genomes, potentially complicating or perhaps undermining attempts to reconstruct historical relationships among genomes (i.e., a genome tree). Whether most genes in a genome have arisen via treelike (vertical) descent or by lateral transfer across lineages can be tested if enough complete genome sequences are used. We define a phylogenetically discordant sequence (PDS) as an open reading frame (ORF) that exhibits patterns of similarity relationships statistically distinguishable from those of most other ORFs in the same genome. PDSs represent between 6.0 and 16.8% (mean, 10.8%) of the analyzable ORFs in the genomes of 28 bacteria, eight archaea, and one eukaryote (Saccharomyces cerevisiae). In this study we developed and assessed a distance-based approach, based on mean pairwise sequence similarity, for generating genome trees. Exclusion of PDSs improved bootstrap support for basal nodes but altered few topological features, indicating that there is little systematic bias among PDSs. Many but not all features of the genome tree from which PDSs were excluded are consistent with the 16S rRNA tree.

Download Full-text

Community phylogenetics require phylogenies reconstructed from plastid genomes

10.22541/au.161834751.14170237/v1 ◽

2021 ◽

Author(s):

Lu Jin ◽

Jia-Jia Liu ◽

Tian-Wen Xiao ◽

Qiao-Ming Li ◽

Luxiang Lin ◽

...

Keyword(s):

Functional Traits ◽

Phylogenetic Trees ◽

Phylogenetic Signal ◽

Maximum Height ◽

Type I ◽

Community Phylogenetics ◽

Phylogenetic Structure ◽

Plastid Genomes ◽

A Genome ◽

Genome Tree

Phylogenetic trees have been extensively used in community ecology. However, how the phylogenetic reconstruction affects ecological inferences is poorly understood. In this study, we reconstructed three different types of phylogenetic trees (a synthetic-tree generated using VPhylomaker, a barcode-tree generated using rbcL+matK+trnH-psbA and a genome-tree generated from plastid genomes) that represented an increasing level of phylogenetic resolution among 580 woody plant species from six dynamic plots in subtropical evergreen broadleaved forests of China. We then evaluated the performance of each phylogeny in estimations of community phylogenetic structure, turnover and phylogenetic signal in functional traits. As expected, the genome-tree was most resolved and most supported for relationships among species. For local phylogenetic structure, the three trees showed consistent results with Faith’s PD and MPD; however, only the synthetic-tree produced significant clustering patterns using MNTD for some plots. For phylogenetic turnover, contrasting results between the molecular trees and the synthetic-tree occurred only with nearest neighbor distance. The barcode-tree agreed more with the genome-tree than the synthetic-tree for both phylogenetic structure and turnover. For functional traits, both the barcode-tree and genome-tree detected phylogenetic signal in maximum height, but only the genome-tree detected signal in leaf width. This is the first study that uses plastid genomes in large-scale community phylogenetics. Our results highlight the outperformance of genome-trees over barcode-trees and synthetic-trees for the analyses studied here. Our results also point to the possibility of Type I and II errors in estimation of phylogenetic structure and turnover and detection of phylogenetic signal when using synthetic-trees.

Download Full-text

Genome Tree of Life: Deep Burst of Organism Diversity

10.1101/756155 ◽

2019 ◽

Cited By ~ 2

Author(s):

JaeJin Choi ◽

Sung-Hou Kim

Keyword(s):

Genome Sequence ◽

Tree Of Life ◽

Whole Genome Sequence ◽

Whole Genome ◽

A Genome ◽

Living Organisms ◽

Life On Earth ◽

Large Groups ◽

Emergence Of Life ◽

Genome Tree

AbstractAn organism Tree of Life (organism ToL) is a conceptual and metaphorical tree to capture a simplified narrative of the evolutionary course and kinship among the extant organisms of today. Such tree cannot be experimentally validated but may be reconstructed based on characteristics associated with the extant organisms. Since the whole genome sequence of an organism is, at present, the most comprehensive descriptor of the organism, a genome Tol can be an empirically derivable surrogate for the organism ToL. However, a genome ToL has been impossible to construct because of the practical reasons that experimentally determining the whole genome sequences of a large number of diverse organisms was technically impossible. Thus, for several decades, gene ToLs, based on selected genes, have been commonly used as a surrogate for the organisms ToL. This situation changed dramatically during the last several decades due to rapid advances in DNA sequencing technology. Here we describe the main features of a genome ToL that are different from those of the broadly accepted gene ToLs: (a) the first two organism groups to emerge are the founders of prokarya and eukarya, (b) they diversify into six large groups and all the founders of the groups have emerged in a “Deep Burst” at the very beginning period of the emergence of Life on Earth and (c) other differences are notable in the order of emergence of smaller groups.Significance StatementTree of Life is a conceptual and metaphorical tree that captures a simplified narrative of the evolutionary course and kinship among all living organisms of today. Since the whole genome sequence information of an organism is, at present, the most comprehensive description of the organism, we reconstructed a Genome Tree of Life using the proteome information from the whole genomes of over 4000 different living organisms on Earth. It suggests that (a) the first two primitive organism groups to emerge are the founders of prokarya and eukarya, (b) they diversify into six large groups, and (c) all the founders of the groups have emerged in a “Deep Burst” at the very beginning period of the emergence of Life on Earth.

Download Full-text

A genome Tree of Life for the Fungi kingdom

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1711939114 ◽

2017 ◽

Vol 114 (35) ◽

pp. 9391-9396 ◽

Cited By ~ 53

Author(s):

JaeJin Choi ◽

Sung-Hou Kim

Keyword(s):

Dna Sequences ◽

Gene Tree ◽

Complete Information ◽

Whole Genome ◽

Gene Trees ◽

Multiple Sequence ◽

A Genome ◽

Conserved Genes ◽

Genome Information ◽

Genome Tree

Fungi belong to one of the largest and most diverse kingdoms of living organisms. The evolutionary kinship within a fungal population has so far been inferred mostly from the gene-information–based trees (“gene trees”), constructed commonly based on the degree of differences of proteins or DNA sequences of a small number of highly conserved genes common among the population by a multiple sequence alignment (MSA) method. Since each gene evolves under different evolutionary pressure and time scale, it has been known that one gene tree for a population may differ from other gene trees for the same population depending on the subjective selection of the genes. Within the last decade, a large number of whole-genome sequences of fungi have become publicly available, which represent, at present, the most fundamental and complete information about each fungal organism. This presents an opportunity to infer kinship among fungi using a whole-genome information-based tree (“genome tree”). The method we used allows comparison of whole-genome information without MSA, and is a variation of a computational algorithm developed to find semantic similarities or plagiarism in two books, where we represent whole-genomic information of an organism as a book of words without spaces. The genome tree reveals several significant and notable differences from the gene trees, and these differences invoke new discussions about alternative narratives for the evolution of some of the currently accepted fungal groups.

Download Full-text

Highly parallel genome variant engineering with CRISPR/Cas9 in eukaryotic cells

10.1101/147637 ◽

2017 ◽

Cited By ~ 3

Author(s):

Meru J. Sadhu ◽

Joshua S. Bloom ◽

Laura Day ◽

Jake J. Siegel ◽

Sriram Kosuri ◽

...

Keyword(s):

Essential Genes ◽

Protein Domain ◽

Eukaryotic Cells ◽

Sequence Variants ◽

Large Classes ◽

Premature Termination Codons ◽

A Genome ◽

Functional Consequences ◽

Dna Sequence Variants ◽

High Throughput Manner

AbstractDirect measurement of functional effects of DNA sequence variants throughout a genome is a major challenge. We developed a method that uses CRISPR/Cas9 to engineer many specific variants of interest in parallel in the budding yeast Saccharomyces cerevisiae, and to screen them for functional effects. We used the method to examine the functional consequences of premature termination codons (PTCs) at different locations within all annotated essential genes in yeast. We found that most PTCs were highly deleterious unless they occurred close to the C-terminal end and did not interrupt an annotated protein domain. Surprisingly, we discovered that some putatively essential genes are dispensable, while others have large dispensable regions. This approach can be used to profile the effects of large classes of variants in a high-throughput manner.

Download Full-text

Comparative analysis of Rosetta stone events in Klebsiella pneumoniae and Streptococcus pneumoniae for drug target identification

Beni-Suef University Journal of Basic and Applied Sciences ◽

10.1186/s43088-021-00126-7 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Poornima Ramesh ◽

Jayashree Honnebailu Nagendrappa ◽

Santosh Kumar Hulikal Shivashankara

Keyword(s):

Drug Target ◽

Drug Targets ◽

Functional Role ◽

Target Identification ◽

Protein Domain ◽

Model Parameters ◽

Signaling Mechanism ◽

Rosetta Stone ◽

A Genome ◽

Drug Target Identification

Abstract Background Drug target identification is a fast-growing field of research in many human diseases. Many strategies have been devised in the post-genomic era to identify new drug targets for infectious diseases. Analysis of protein sequences from different organisms often reveals cases of exon/ORF shuffling in a genome. This results in the fusion of proteins/domains, either in the same genome or that of some other organism, and is termed Rosetta stone sequences. They help link disparate proteins together describing local and global relationships among proteomes. The functional role of proteins is determined mainly by domain-domain interactions and leading to the corresponding signaling mechanism. Putative proteins can be identified as drug targets by re-annotating their functional role through domain-based strategies. Results This study has utilized a bioinformatics approach to identify the putative proteins that are ideal drug targets for pneumonia infection by re-annotating the proteins through position-specific iterations. The putative proteome of two pneumonia-causing pathogens was analyzed to identify protein domain abundance and versatility among them. Common domains found in both pathogens were identified, and putative proteins containing these domains were re-annotated. Among many druggable protein targets, the re-annotation of EJJ83173 (which contains the GFO_IDH_MocA domain) showed that its probable function is glucose-fructose oxidoreduction. This protein was found to have sufficient interactor proteins and homolog in both pathogens but no homolog in the host (human), indicating it as an ideal drug target. 3D modeling of the protein showed promising model parameters. The model was utilized for virtual screening which revealed several ligands with inhibitory activity. These ligands included molecules documented in traditional Chinese medicine and currently marketed drugs. Conclusions This novel strategy of drug target identification through domain-based putative protein re-annotation presents a prospect to validate the proposed drug target to confer its utility as a typical protein targeting both pneumonia-causing species studied herewith.

Download Full-text

Genome-wide analysis of maize OSCA family members and their involvement in drought stress

PeerJ ◽

10.7717/peerj.6765 ◽

2019 ◽

Vol 7 ◽

pp. e6765 ◽

Cited By ~ 5

Author(s):

Shuangcheng Ding ◽

Xin Feng ◽

Hewei Du ◽

Hongwei Wang

Keyword(s):

Genetic Variation ◽

Drought Stress ◽

Drought Tolerance ◽

Gene Family ◽

Association Analysis ◽

Expression Profiles ◽

Protein Domain ◽

Maize Genome ◽

Genome Wide ◽

A Genome

Background Worldwide cultivation of maize is often impacted negatively by drought stress. Hyperosmolality-gated calcium-permeable channels (OSCA) have been characterized as osmosensors in Arabidopsis. However, the involvement of members of the maize OSCA (ZmOSCA) gene family in response to drought stress is unknown. It is furthermore unclear which ZmOSCA gene plays a major role in genetic improvement of drought tolerance in Maize. Methods We predicted the protein domain structure and transmembrane regions by using the NCBI Conserved Domain Database database and TMHMM server separately. The phylogeny tree was built by Mega7. We used the mixed linear model in TASSEL to perform the family-based association analysis. Results In this report, 12 ZmOSCA genes were uncovered in the maize genome by a genome-wide survey and analyzed systematically to reveal their synteny and phylogenetic relationship with the genomes of rice, maize, and sorghum. These analyses indicated a relatively conserved evolutionary history of the ZmOSCA gene family. Protein domain and transmembrane analysis indicated that most of the 12 ZmOSCAs shared similar structures with their homologs. The result of differential expression analysis under drought at various stages, as well as the expression profiles in 15 tissues, revealed a functional divergence of ZmOSCA genes. Notably, the expression level of ZmOSCA4.1 being up-regulated in both seedlings and adult leaves. Notably, the association analysis between genetic variations in these genes and drought tolerance was detected. Significant associations between genetic variation in ZmOSCA4.1 and drought tolerance were found at the seedling stage. Our report provides a detailed analysis of the ZmOSCAs in the maize genome. These findings will contribute to future studies on the functional characterization of ZmOSCA proteins in response to water deficit stress, as well as understanding the mechanism of genetic variation in drought tolerance in maize.

Download Full-text

Genome-Wide Association Study Reveals the Genetic Architecture of Seed Vigor in Oats

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401602 ◽

2020 ◽

Vol 10 (12) ◽

pp. 4489-4503

Author(s):

Ching-Ting Huang ◽

Kathy Esvelt Klos ◽

Yung-Fen Huang

Keyword(s):

Crop Production ◽

Genetic Architecture ◽

Genome Wide Association Study ◽

Seed Vigor ◽

Protein Domain ◽

Genome Wide Association ◽

Root Surface Area ◽

Forage Crop ◽

Genome Wide ◽

A Genome

Seed vigor is crucial for crop early establishment in the field and is particularly important for forage crop production. Oat (Avena sativa L.) is a nutritious food crop and also a valuable forage crop. However, little is known about the genetics of seed vigor in oats. To investigate seed vigor-related traits and their genetic architecture in oats, we developed an easy-to-implement image-based phenotyping pipeline and applied it to 650 elite oat lines from the Collaborative Oat Research Enterprise (CORE). Root number, root surface area, and shoot length were measured in two replicates. Variables such as growth rate were derived. Using a genome-wide association (GWA) approach, we identified 34 and 16 unique loci associated with root traits and shoot traits, respectively, which corresponded to 41 and 16 unique SNPs at a false discovery rate < 0.1. Nine root-associated loci were organized into four sets of homeologous regions, while nine shoot-associated loci were organized into three sets of homeologous regions. The context sequences of five trait-associated markers matched to the sequences of rice, Brachypodium and maize (E-value < 10−10), including three markers matched to known gene models with potential involvement in seed vigor. These were a glucuronosyltransferase, a mitochondrial carrier protein domain containing protein, and an iron-sulfur cluster protein. This study presents the first GWA study on oat seed vigor and data of this study can provide guidelines and foundation for further investigations.

Download Full-text

A Proposal for a Genome Similarity-Based Taxonomy for Plant-Pathogenic Bacteria that Is Sufficiently Precise to Reflect Phylogeny, Host Range, and Outbreak Affiliation Applied to Pseudomonas syringae sensu lato as a Proof of Concept

Phytopathology ◽

10.1094/phyto-07-16-0252-r ◽

2017 ◽

Vol 107 (1) ◽

pp. 18-28 ◽

Cited By ~ 10

Author(s):

Boris A. Vinatzer ◽

Alexandra J. Weisberg ◽

Caroline L. Monteil ◽

Haitham A. Elmarakeby ◽

Samuel K. Sheppard ◽

...

Keyword(s):

Host Range ◽

Pseudomonas Syringae ◽

Pathogenic Bacteria ◽

Plant Pathogenic Bacteria ◽

Subspecies Level ◽

A Genome ◽

Temporary Solution ◽

Taxonomic Framework ◽

Genome Tree ◽

Taxonomic System

Taxonomy of plant pathogenic bacteria is challenging because pathogens of different crops often belong to the same named species but current taxonomy does not provide names for bacteria below the subspecies level. The introduction of the host range-based pathovar system in the 1980s provided a temporary solution to this problem but has many limitations. The affordability of genome sequencing now provides the opportunity for developing a new genome-based taxonomic framework. We already proposed to name individual bacterial isolates based on pairwise genome similarity. Here, we expand on this idea and propose to use genome similarity-based codes, which we now call life identification numbers (LINs), to describe and name bacterial taxa. Using 93 genomes of Pseudomonas syringae sensu lato, LINs were compared with a P. syringae genome tree whereby the assigned LINs were found to be informative of a majority of phylogenetic relationships. LINs also reflected host range and outbreak association for strains of P. syringae pathovar actinidiae, a pathovar for which many genome sequences are available. We conclude that LINs could provide the basis for a new taxonomic framework to address the shortcomings of the current pathovar system and to complement the current taxonomic system of bacteria in general.

Download Full-text

Efficient identification of NLR by using a genome-wide protein domain and motif survey program, Ex-DOMAIN

Plant Biotechnology ◽

10.5511/plantbiotechnology.18.0418a ◽

2018 ◽

Vol 35 (2) ◽

pp. 177-180

Author(s):

Mari Narusaka ◽

Harunobu Yunokawa ◽

Yoshihiro Narusaka

Keyword(s):

Protein Domain ◽

Genome Wide ◽

A Genome

Download Full-text