scholarly journals Tissue-specific evolution of protein coding genes in human and mouse

2014 ◽  
Author(s):  
Nadezda Kryuchkova-Mostacci ◽  
Marc Robinson-Rechavi

Protein-coding genes evolve at different rates, and the influence of different parameters, from gene size to expression level, has been extensively studied. While in yeast gene expression level is the major causal factor of gene evolutionary rate, the situation is more complex in animals. Here we investigate these relations further, especially taking in account gene expression in different organs as well as indirect correlations between parameters. We used RNA-seq data from two large datasets, covering 22 mouse tissues and 27 human tissues. Over all tissues, evolutionary rate only correlates weakly with levels and breadth of expression. The strongest explanatory factors of strong purifying selection are GC content, expression in many developmental stages, and expression in brain tissues. While the main component of evolutionary rate is purifying selection, we also find tissue-specific patterns for sites under neutral evolution and for positive selection. We observe fast evolution of genes expressed in testis, but also in other tissues, notably liver, which are explained by weak purifying selection rather than by positive selection.

2017 ◽  
Author(s):  
Jorge Ruiz-Orera ◽  
José Luis Villanueva-Cañas ◽  
William Blevins ◽  
M.Mar Albà

Recent years have witnessed the discovery of protein–coding genes which appear to have evolved de novo from previously non-coding sequences. This has changed the long-standing view that coding sequences can only evolve from other coding sequences. However, there are still many open questions regarding how new protein-coding sequences can arise from non-genic DNA. Two prerequisites for the birth of a new functional protein-coding gene are that the corresponding DNA fragment is transcribed and that it is also translated. Transcription is known to be pervasive in the genome, producing a large number of transcripts that do not correspond to conserved protein-coding genes, and which are usually annotated as long non-coding RNAs (lncRNA). Recently, sequencing of ribosome protected fragments (Ribo-Seq) has provided evidence that many of these transcripts actually translate small proteins. We have used mouse non-synonymous and synonymous variation data to estimate the strength of purifying selection acting on the translated open reading frames (ORFs). Whereas a subset of the lncRNAs are likely to actually be true protein-coding genes (and thus previously misclassified), the bulk of lncRNAs code for proteins which show variation patterns consistent with neutral evolution. We also show that the ORFs that have a more favorable, coding-like, sequence composition are more likely to be translated than other ORFs in lncRNAs. This study provides strong evidence that there is a large and ever-changing reservoir of lowly abundant proteins; some of these peptides may become useful and act as seeds for de novo gene evolution.


2021 ◽  
Vol 22 (4) ◽  
pp. 1876
Author(s):  
Frida Belinky ◽  
Ishan Ganguly ◽  
Eugenia Poliakov ◽  
Vyacheslav Yurchenko ◽  
Igor B. Rogozin

Nonsense mutations turn a coding (sense) codon into an in-frame stop codon that is assumed to result in a truncated protein product. Thus, nonsense substitutions are the hallmark of pseudogenes and are used to identify them. Here we show that in-frame stop codons within bacterial protein-coding genes are widespread. Their evolutionary conservation suggests that many of them are not pseudogenes, since they maintain dN/dS values (ratios of substitution rates at non-synonymous and synonymous sites) significantly lower than 1 (this is a signature of purifying selection in protein-coding regions). We also found that double substitutions in codons—where an intermediate step is a nonsense substitution—show a higher rate of evolution compared to null models, indicating that a stop codon was introduced and then changed back to sense via positive selection. This further supports the notion that nonsense substitutions in bacteria are relatively common and do not necessarily cause pseudogenization. In-frame stop codons may be an important mechanism of regulation: Such codons are likely to cause a substantial decrease of protein expression levels.


Genome ◽  
2013 ◽  
Vol 56 (7) ◽  
pp. 415-423 ◽  
Author(s):  
Jingjing Zhao ◽  
Hongbo Shi ◽  
Nadav Ahituv

Tissue-specific gene expression is thought to be one of the major forces shaping mammalian gene order. A recent study that used whole-genome chromosome conformation assays has shown that the mammalian genome is divided into specific topological domains that are shared between different tissues and organisms. Here, we wanted to assess whether gene expression and regulation are involved in shaping these domains and can be used to classify them. We analyzed gene expression and regulation levels in these domains by using RNA-seq and enhancer-associated ChIP-seq datasets for 17 different mouse tissues. We found 162 domains that are active (high gene expression and regulation) in all 17 tissues. These domains are significantly shorter, contain less repeats, and have more housekeeping genes. In contrast, we found 29 domains that are inactive (low gene expression and regulation) in all analyzed tissues and are significantly longer, have more repeats, and gene deserts. Tissue-specific active domains showed some correlation with tissue-type and gene ontology. Domain temporal gene regulation and expression differences also displayed some gene ontology terms fitting their temporal function. Combined, our results provide a catalog of shared and tissue-specific topological domains and suggest that gene expression and regulation could have a role in shaping them.


2017 ◽  
Author(s):  
Cristina Cruz ◽  
Monica Della Rosa ◽  
Christel Krueger ◽  
Qian Gao ◽  
Lucy Field ◽  
...  

AbstractTranscription of protein coding genes is accompanied by recruitment of COMPASS to promoter-proximal chromatin, which deposits di- and tri-methylation on histone H3 lysine 4 (H3K4) to form H3K4me2 and H3K4me3. Here we determine the importance of COMPASS in maintaining gene expression across lifespan in budding yeast. We find that COMPASS mutations dramatically reduce replicative lifespan and cause widespread gene expression defects. Known repressive functions of H3K4me2 are progressively lost with age, while hundreds of genes become dependent on H3K4me3 for full expression. Induction of these H3K4me3 dependent genes is also impacted in young cells lacking COMPASS components including the H3K4me3-specific factor Spp1. Remarkably, the genome-wide occurrence of H3K4me3 is progressively reduced with age despite widespread transcriptional induction, minimising the normal positive correlation between promoter H3K4me3 and gene expression. Our results provide clear evidence that H3K4me3 is required to attain normal expression levels of many genes across organismal lifespan.


2020 ◽  
Vol 35 (5) ◽  
pp. 1230-1245 ◽  
Author(s):  
L C Poulsen ◽  
J A Bøtkjær ◽  
O Østrup ◽  
K B Petersen ◽  
C Yding Andersen ◽  
...  

Abstract STUDY QUESTION How does the human granulosa cell (GC) transcriptome change during ovulation? SUMMARY ANSWER Two transcriptional peaks were observed at 12 h and at 36 h after induction of ovulation, both dominated by genes and pathways known from the inflammatory system. WHAT IS KNOWN ALREADY The crosstalk between GCs and the oocyte, which is essential for ovulation and oocyte maturation, can be assessed through transcriptomic profiling of GCs. Detailed transcriptional changes during ovulation have not previously been assessed in humans. STUDY DESIGN, SIZE, DURATION This prospective cohort study comprised 50 women undergoing fertility treatment in a standard antagonist protocol at a university hospital-affiliated fertility clinic in 2016–2018. PARTICIPANTS/MATERIALS, SETTING, METHODS From each woman, one sample of GCs was collected by transvaginal ultrasound-guided follicle aspiration either before or 12 h, 17 h or 32 h after ovulation induction (OI). A second sample was collected at oocyte retrieval, 36 h after OI. Total RNA was isolated from GCs and analyzed by microarray. Gene expression differences between the five time points were assessed by ANOVA with a random factor accounting for the pairing of samples, and seven clusters of protein-coding genes representing distinct expression profiles were identified. These were used as input for subsequent bioinformatic analyses to identify enriched pathways and suggest upstream regulators. Subsets of genes were assessed to explore specific ovulatory functions. MAIN RESULTS AND THE ROLE OF CHANCE We identified 13 345 differentially expressed transcripts across the five time points (false discovery rate, <0.01) of which 58% were protein-coding genes. Two clusters of mainly downregulated genes represented cell cycle pathways and DNA repair. Upregulated genes showed one peak at 12 h that resembled the initiation of an inflammatory response, and one peak at 36 h that resembled the effector functions of inflammation such as vasodilation, angiogenesis, coagulation, chemotaxis and tissue remodelling. Genes involved in cell–matrix interactions as a part of cytoskeletal rearrangement and cell motility were also upregulated at 36 h. Predicted activated upstream regulators of ovulation included FSH, LH, transforming growth factor B1, tumour necrosis factor, nuclear factor kappa-light-chain-enhancer of activated B cells, coagulation factor 2, fibroblast growth factor 2, interleukin 1 and cortisol, among others. The results confirmed early regulation of several previously described factors in a cascade inducing meiotic resumption and suggested new factors involved in cumulus expansion and follicle rupture through co-regulation with previously described factors. LARGE SCALE DATA The microarray data were deposited to the Gene Expression Omnibus (www.ncbi.nlm.nih.gov/gds/, accession number: GSE133868). LIMITATIONS, REASONS FOR CAUTION The study included women undergoing ovarian stimulation and the findings may therefore differ from a natural cycle. However, the results confirm significant regulation of many well-established ovulatory genes from a series of previous studies such as amphiregulin, epiregulin, tumour necrosis factor alfa induced protein 6, tissue inhibitor of metallopeptidases 1 and plasminogen activator inhibitor 1, which support the relevance of the results. WIDER IMPLICATIONS OF THE FINDINGS The study increases our understanding of human ovarian function during ovulation, and the publicly available dataset is a valuable resource for future investigations. Suggested upstream regulators and highly differentially expressed genes may be potential pharmaceutical targets in fertility treatment and gynaecology. STUDY FUNDING/COMPETING INTEREST(S) The study was funded by EU Interreg ÔKS V through ReproUnion (www.reprounion.eu) and by a grant from the Region Zealand Research Foundation. None of the authors have any conflicts of interest to declare.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Mikhail Pomaznoy ◽  
Ashu Sethi ◽  
Jason Greenbaum ◽  
Bjoern Peters

Abstract RNA-seq methods are widely utilized for transcriptomic profiling of biological samples. However, there are known caveats of this technology which can skew the gene expression estimates. Specifically, if the library preparation protocol does not retain RNA strand information then some genes can be erroneously quantitated. Although strand-specific protocols have been established, a significant portion of RNA-seq data is generated in non-strand-specific manner. We used a comprehensive stranded RNA-seq dataset of 15 blood cell types to identify genes for which expression would be erroneously estimated if strand information was not available. We found that about 10% of all genes and 2.5% of protein coding genes have a two-fold or higher difference in estimated expression when strand information of the reads was ignored. We used parameters of read alignments of these genes to construct a machine learning model that can identify which genes in an unstranded dataset might have incorrect expression estimates and which ones do not. We also show that differential expression analysis of genes with biased expression estimates in unstranded read data can be recovered by limiting the reads considered to those which span exonic boundaries. The resulting approach is implemented as a package available at https://github.com/mikpom/uslcount.


2019 ◽  
Vol 9 (12) ◽  
pp. 6821-6832 ◽  
Author(s):  
Jacob Njaramba Ngatia ◽  
Tian Ming Lan ◽  
Thi Dao Dinh ◽  
Le Zhang ◽  
Ahmed Khalid Ahmed ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document