scholarly journals In situ dissecting the evolution of gene duplication with different histone modification patterns based on high-throughput data analysis in Arabidopsis thaliana

PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e10426
Author(s):  
Jingjing Wang ◽  
Yuriy L. Orlov ◽  
Xue Li ◽  
Yincong Zhou ◽  
Yongjing Liu ◽  
...  

Background Genetic regulation is known to contribute to the divergent expression of duplicate genes; however, little is known about how epigenetic modifications regulate the expression of duplicate genes in plants. Methods The histone modification (HM) profile patterns of different modes of gene duplication, including the whole genome duplication, proximal duplication, tandem duplication and transposed duplication were characterized based on ChIP-chip or ChIP-seq datasets. In this study, 10 distinct HM marks including H2Bub, H3K4me1, H3K4me2, H3K4me3, H3K9ac, H3K9me2, H3K27me1, H3K27me3, H3K36me3 and H3K14ac were analyzed. Moreover, the features of gene duplication with different HM patterns were characterized based on 88 RNA-seq datasets of Arabidopsis thaliana. Results This study showed that duplicate genes in Arabidopsis have a more similar HM pattern than single-copy genes in both their promoters and protein-coding regions. The evolution of HM marks is found to be coupled with coding sequence divergence and expression divergence after gene duplication. We found that functionally selective constraints may impose on epigenetic evolution after gene duplication. Furthermore, duplicate genes with distinct functions have more divergence in histone modification compared with the ones with the same function, while higher expression divergence is found with mutations of chromatin modifiers. This study shows the role of epigenetic marks in regulating gene expression and functional divergence after gene duplication in plants based on sequencing data.

2016 ◽  
Author(s):  
Kousuke Hanada ◽  
Ayumi Tezuka ◽  
Masafumi Nozawa ◽  
Yutaka Suzuki ◽  
Sumio Sugano ◽  
...  

AbstractLineage-specifically duplicated genes likely contribute to the phenotypic divergence in closely related species. However, neither the frequency of duplication events nor the degree of selective pressures immediately after gene duplication is clear in the speciation process. Plants have substantially higher gene duplication rates than most other eukaryotes. Here, using Illumina short reads from Arabidopsis halleri, which has highly qualified plant genomes in close species (Brassica rapa, A. thaliana and A. lyrata), we succeeded in generating orthologous gene groups among B. rapa, A. thaliana, A. lyrata and A. halleri. The frequency of duplication events in the Arabidopsis lineage was approximately 10 times higher than the frequency inferred by comparative genomics of Arabidopsis, poplar, rice and moss. Of the currently retained genes in A. halleri, 11–24% had undergone gene duplication in the Arabidopsis lineage. To examine the degree of selective pressure for duplicated genes, we calculated the ratios of nonsynonymous to synonymous substitution rates (KA/KS) in the A. halleri-lyrata and A. halleri lineages. Using a maximum-likelihood framework, we examined positive (KA/KS > 1) and purifying selection (KA/KS < 1) at a significant level (P < 0.01). Duplicate genes tended to have a higher proportion of positive selection compared with non-duplicated genes. More interestingly, we found that functional divergence of duplicated genes was accelerated several million years after gene duplication at a higher proportion than immediately after gene duplication.


2020 ◽  
Author(s):  
Michael DeGiorgio ◽  
Raquel Assis

AbstractLearning about the roles that duplicate genes play in the origins of novel phenotypes requires an understanding of how their functions evolve. To date, only one method—CDROM—has been developed with this goal in mind. In particular, CDROM employs gene expression distances as proxies for functional divergence, and then classifies the evolutionary mechanisms retaining duplicate genes from comparisons of these distances in a decision tree framework. However, CDROM does not account for stochastic shifts in gene expression or leverage advances in contemporary statistical learning for performing classification, nor is it capable of predicting the underlying parameters of duplicate gene evolution. Thus, here we develop CLOUD, a multi-layer neural network built upon a model of gene expression evolution that can both classify duplicate gene retention mechanisms and predict their underlying evolutionary parameters. We show that not only is the CLOUD classifier substantially more powerful and accurate than CDROM, but that it also yields accurate parameter predictions, enabling a better understanding of the specific forces driving the evolution and long-term retention of duplicate genes. Further, application of the CLOUD classifier and predictor to empirical data from Drosophila recapitulates many previous findings about gene duplication in this lineage, showing that new functions often emerge rapidly and asymmetrically in younger duplicate gene copies, and that functional divergence is driven by strong natural selection. Hence, CLOUD represents the best available method for classifying retention mechanisms and predicting evolutionary parameters of duplicate genes, thereby also highlighting the utility of incorporating sophisticated statistical learning techniques to address long-standing questions about evolution after gene duplication.


2018 ◽  
Author(s):  
Xueyuan Jiang ◽  
Raquel Assis

AbstractGene duplication has played an important role in the evolution and domestication of flowering plants. Yet little is known about how plant duplicate genes evolve and are retained over long timescales, particularly those arising from small-scale duplication (SSD) rather than whole-genome duplication (WGD) events. Here we address this question in the Poaceae (grass) family by analyzing gene expression data from nine tissues of Brachypodium distachyon, Oryza sativa japonica (rice), and Sorghum bicolor (sorghum). Consistent with theoretical predictions, expression profiles of most grass genes are conserved after SSD, suggesting that functional conservation is the primary outcome of SSD in grasses. However, we also uncover support for widespread functional divergence, much of which occurs asymmetrically via the process of neofunctionalization. Moreover, neofunctionalization preferentially targets younger (child) duplicate gene copies, is associated with RNA-mediated duplication, and occurs quickly after duplication. Further analysis reveals that functional divergence of SSD-derived genes is positively correlated with both sequence divergence and tissue specificity in all three grass species, and particularly with anther expression in B. distachyon. Therefore, as found in many animal species, SSD-derived grass genes often undergo rapid functional divergence that may be driven by natural selection on male-specific phenotypes.


2020 ◽  
Vol 37 (8) ◽  
pp. 2322-2331
Author(s):  
Carl J Dyson ◽  
Michael A D Goodisman

Abstract Gene duplication serves a critical role in evolutionary adaptation by providing genetic raw material to the genome. The evolution of duplicated genes may be influenced by epigenetic processes such as DNA methylation, which affects gene function in some taxa. However, the manner in which DNA methylation affects duplicated genes is not well understood. We studied duplicated genes in the honeybee Apis mellifera, an insect with a highly sophisticated social structure, to investigate whether DNA methylation was associated with gene duplication and genic evolution. We found that levels of gene body methylation were significantly lower in duplicate genes than in single-copy genes, implicating a possible role of DNA methylation in postduplication gene maintenance. Additionally, we discovered associations of gene body methylation with the location, length, and time since divergence of paralogous genes. We also found that divergence in DNA methylation was associated with divergence in gene expression in paralogs, although the relationship was not completely consistent with a direct link between DNA methylation and gene expression. Overall, our results provide further insight into genic methylation and how its association with duplicate genes might facilitate evolutionary processes and adaptation.


2018 ◽  
Author(s):  
Juan Miguel Escorcia-Rodríguez ◽  
Mario Esposito ◽  
Julio Augusto Freyre-González ◽  
Gabriel Moreno-Hagelsieb

AbstractBackgroundOrthologs diverge after speciation events and paralogs after gene duplication. It is thus expected that orthologs would tend to keep their functions, while paralogs could be a source of new functions. Because protein functional divergence follows from non-synonymous substitutions, we performed an analysis based on the ratio of non-synonymous to synonymous substitutions (dN/dS) as proxy for functional divergence. We used four working definitions of orthology, including reciprocal best hits (RBH), among other definitions based on network analyses and clustering.ResultsThe results showed that orthologs, by all definitions tested, had values of dN/dS noticeably lower than those of paralogs, not only suggesting that orthologs keep their functions better, but also that paralogs are a readily source of functional novelty. The differences in dN/dS ratios remained favouring the functional stability of orthologs after eliminating gene comparisons with potential problems, such as genes having a high codon usage bias, low coverage of either of the aligned sequences, or sequences with very high similarities. The dN/dS ratios kept suggesting better functional stability of orthologs regardless of overall sequence divergence. Separating orthologs and paralogs into groups with similar overall substitution rates kept showing dN/dS differences favouring the functional stability of orthologs over that of paralogs.AvailabilityA couple of programs for obtaining orthologs and dN/dS values as tested in this manuscript are available at github: https://github.com/Computational-conSequences/SequenceTools.


2020 ◽  
Author(s):  
Jeremy E. Coate ◽  
Andrew D. Farmer ◽  
John Schiefelbein ◽  
Jeff J. Doyle

ABSTRACTGene duplication is a key evolutionary phenomenon, prevalent in all organisms but particularly so in plants, where whole genome duplication (WGD; polyploidy) is a major force in genome evolution. Much effort has been expended in attempting to understand the evolution of duplicate genes, addressing such questions as why some paralogue pairs rapidly return to single copy status whereas, in other pairs, paralogues are retained and may (or may not) diverge in expression pattern or function. The effect of a gene—its site of expression and thus the initial locus of its function—occurs at the level of a cell comprising a single cell type at a given state of the cell’s development. Thus, it is critical to understand the expression of duplicated gene pairs at a cellular level of resolution. Using Arabidopsis thaliana root single cell transcriptomic data we identify 36 cell clusters, each representing a cell type at a particular developmental state, and analyze expression patterns of over 11,000 duplicate gene pairs produced by three cycles of polyploidy as well as by various types of single gene duplication mechanisms. We categorize paralogue pairs by their patterns of expression, identifying pairs showing strongly biased paralogue/homoeologue expression in different cell clusters. Notably, the precision of cell-level expression data permits the identification of pairs showing alternate bias, with each paralogue comprising 90% or greater of the pair’s expression in different cell clusters, consistent with subfunctionalization at the cell type or cell state level, and, in some cases, at the level of individual cells. We identify a set of over 7,000 genes whose expression in all 36 cell clusters suggests that the single copy ancestor of each was also expressed in all root cells. With this cell-level expression information we hypothesize that there have been major shifts in expression for the majority of duplicated genes, to different degrees depending, as expected, on gene function and duplication type, but also on the particular cell type and state.


Author(s):  
Michael DeGiorgio ◽  
Raquel Assis

Abstract Learning about the roles that duplicate genes play in the origins of novel phenotypes requires an understanding of how their functions evolve. A previous method for achieving this goal, CDROM, employs gene expression distances as proxies for functional divergence and then classifies the evolutionary mechanisms retaining duplicate genes from comparisons of these distances in a decision tree framework. However, CDROM does not account for stochastic shifts in gene expression or leverage advances in contemporary statistical learning for performing classification, nor is it capable of predicting the parameters driving duplicate gene evolution. Thus, here we develop CLOUD, a multi-layer neural network built on a model of gene expression evolution that can both classify duplicate gene retention mechanisms and predict their underlying evolutionary parameters. We show that not only is the CLOUD classifier substantially more powerful and accurate than CDROM, but that it also yields accurate parameter predictions, enabling a better understanding of the specific forces driving the evolution and long-term retention of duplicate genes. Further, application of the CLOUD classifier and predictor to empirical data from Drosophila recapitulates many previous findings about gene duplication in this lineage, showing that new functions often emerge rapidly and asymmetrically in younger duplicate gene copies, and that functional divergence is driven by strong natural selection. Hence, CLOUD represents a major advancement in classifying retention mechanisms and predicting evolutionary parameters of duplicate genes, thereby highlighting the utility of incorporating sophisticated statistical learning techniques to address long-standing questions about evolution after gene duplication.


Genetics ◽  
1997 ◽  
Vol 145 (1) ◽  
pp. 197-205 ◽  
Author(s):  
Alan B Rose ◽  
Jiayang Li ◽  
Robert L Last

Nine blue fluorescent mutants of the flowering plant Arabidopsis thaliana were isolated by genetic selections and fluorescence screens. Each was shown to contain a recessive allele of trp1, a previously described locus that encodes the tryptophan biosynthetic enzyme phosphoribosylanthranilate transferase (PAT, called trpD in bacteria). The trp1 mutants consist of two groups, tryptophan auxotrophs and prototrophs, that differ significantly in growth rate, morphology, and fertility. The trp1 alleles cause plants to accumulate varying amounts of blue fluorescent anthranilate compounds, and only the two least severely affected of the prototrophs have any detectable PAT enzyme activity. All four of the trp1 mutations that were sequenced are G to A or C to T transitions that cause an amino acid change, but in only three of these is the affected residue phylogenetically conserved. There is an unusually high degree of sequence divergence in the single-copy gene encoding PAT from the wild-type Columbia and Landsberg erecta ecotypes of Arabidopsis.


DNA Research ◽  
2018 ◽  
Vol 25 (3) ◽  
pp. 327-339 ◽  
Author(s):  
Kousuke Hanada ◽  
Ayumi Tezuka ◽  
Masafumi Nozawa ◽  
Yutaka Suzuki ◽  
Sumio Sugano ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document