scholarly journals Araport11: a complete reannotation of the Arabidopsis thaliana reference genome

2016 ◽  
Author(s):  
Chia-Yi Cheng ◽  
Vivek Krishnakumar ◽  
Agnes Chan ◽  
Seth Schobel ◽  
Christopher D. Town

ABSTRACTThe flowering plant Arabidopsis thaliana is a dicot model organism for research in many aspects of plant biology. A comprehensive annotation of its genome paves the way for understanding the functions and activities of all types of transcripts, including mRNA, noncoding RNA, and small RNA. The most recent annotation update (TAIR10) released more than five years ago had a profound impact on Arabidopsis research. Maintaining the accuracy of the annotation continues to be a prerequisite for future progress. Using an integrative annotation pipeline, we assembled tissue-specific RNA-seq libraries from 113 datasets and constructed 48,359 transcript models of protein-coding genes in eleven tissues. In addition, we annotated various classes of noncoding RNA including small RNA, long intergenic RNA, small nucleolar RNA, natural antisense transcript, small nuclear RNA, and microRNA using published datasets and in-house analytic results. Altogether, we identified 738 novel protein-coding genes, 508 novel transcribed regions, 5051 non-coding genes, and 35846 small-RNA loci that formerly eluded annotation. Analysis on the splicing events and RNA-seq based expression profile revealed the landscapes of gene structures, untranslated regions, and splicing activities to be more intricate than previously appreciated. Furthermore, we present 692 uniformly expressed housekeeping genes, 43% of whose human orthologs are also housekeeping genes. This updated Arabidopsis genome annotation with a substantially increased resolution of gene models will not only further our understanding of the biological processes of this plant model but also of other species.

2017 ◽  
Author(s):  
Seth Polydore ◽  
Michael J. Axtell

SummaryPlant small RNAs regulate key physiological mechanisms through post-transcriptional and transcriptional silencing of gene expression. sRNAs fall into two major categories: those that are reliant on RNA Dependent RNA Polymerases (RDRs) for biogenesis and those that aren’t. Known RDR-dependent sRNAs include phased and repeat-associated short interfering RNAs, while known RDR-independent sRNAs are primarily microRNAs and other hairpin-derived sRNAs. In this study, we produced and analyzed small RNA-seq libraries from rdr1/rdr2/rdr6 triple mutant plants. Only a small fraction of all sRNA loci were RDR1/RDR2/RDR6-independent; most of these were microRNA loci or associated with predicted hairpin precursors. We found 58 previously annotated microRNA loci that were reliant on RDR1, −2, or −6 function, casting doubt on their classification. We also found 38 RDR1/2/6-independent small RNA loci that are not MIRNAs or otherwise hairpin-derived, and did not fit into other known paradigms for small RNA biogenesis. These 38 small RNA-producing loci have novel biogenesis mechanisms, and are frequently located in the vicinity of protein-coding genes. Altogether, our analysis suggest that these 38 loci represent one or more new types of small RNAs in Arabidopsis thaliana.Significance StatementSmall RNAs regulate gene expression in plants and are produced through a variety of previously-described mechanisms. Here, we examine a set of previously undiscovered small RNA-producing loci that are produced by novel mechanisms.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Étienne Fafard-Couture ◽  
Danny Bergeron ◽  
Sonia Couture ◽  
Sherif Abou-Elela ◽  
Michelle S. Scott

Abstract Background Small nucleolar RNAs (snoRNAs) are mid-size non-coding RNAs required for ribosomal RNA modification, implying a ubiquitous tissue distribution linked to ribosome synthesis. However, increasing numbers of studies identify extra-ribosomal roles of snoRNAs in modulating gene expression, suggesting more complex snoRNA abundance patterns. Therefore, there is a great need for mapping the snoRNome in different human tissues as the blueprint for snoRNA functions. Results We used a low structure bias RNA-Seq approach to accurately quantify snoRNAs and compare them to the entire transcriptome in seven healthy human tissues (breast, ovary, prostate, testis, skeletal muscle, liver, and brain). We identify 475 expressed snoRNAs categorized in two abundance classes that differ significantly in their function, conservation level, and correlation with their host gene: 390 snoRNAs are uniformly expressed and 85 are enriched in the brain or reproductive tissues. Most tissue-enriched snoRNAs are embedded in lncRNAs and display strong correlation of abundance with them, whereas uniformly expressed snoRNAs are mostly embedded in protein-coding host genes and are mainly non- or anticorrelated with them. Fifty-nine percent of the non-correlated or anticorrelated protein-coding host gene/snoRNA pairs feature dual-initiation promoters, compared to only 16% of the correlated non-coding host gene/snoRNA pairs. Conclusions Our results demonstrate that snoRNAs are not a single homogeneous group of housekeeping genes but include highly regulated tissue-enriched RNAs. Indeed, our work indicates that the architecture of snoRNA host genes varies to uncouple the host and snoRNA expressions in order to meet the different snoRNA abundance levels and functional needs of human tissues.


2021 ◽  
Author(s):  
Fangfang Huang ◽  
Yingru Jiang ◽  
Tiantian Chen ◽  
Haoran Li ◽  
Mengjia Fu ◽  
...  

Abstract As a major food crop and model organism, rice has been mostly studied with the largest number of functionally characterized genes among all crops. We previously built the funRiceGenes database including ∼2800 functionally characterized rice genes and ∼5000 members of different gene families. Since being published, the funRiceGenes database has been accessed by more than 49,000 users with over 490,000 page views. The funRiceGenes database has been continuously updated with newly cloned rice genes and newly published literature, based on the progress of rice functional genomics studies. Up to Nov 2021, ≥4100 functionally characterized rice genes and ∼6000 members of different gene families were collected in funRiceGenes, accounting for 22.3% of the 39,045 annotated protein-coding genes in the rice genome. Here, we summarized the update of the funRiceGenes database with new data and new features in the last five years.


2021 ◽  
Author(s):  
Aaron Wacholder ◽  
Omer Acar ◽  
Anne-Ruxandra Carvunis

Ribosome profiling experiments demonstrate widespread translation of eukaryotic genomes outside of annotated protein-coding genes. However, it is unclear how much of this "noncanonical" translation contributes biologically relevant microproteins rather than insignificant translational noise. Here, we developed an integrative computational framework (iRibo) that leverages hundreds of ribosome profiling experiments to detect signatures of translation with high sensitivity and specificity. We deployed iRibo to construct a reference translatome in the model organism S. cerevisiae. We identified ~19,000 noncanonical translated elements outside of the ~5,400 canonical yeast protein-coding genes. Most (65%) of these non-canonical translated elements were located on transcripts annotated as non-coding, or entirely unannotated, while the remainder were located on the 5' and 3' ends of mRNA transcripts. Only 14 non-canonical translated elements were evolutionarily conserved. In stark contrast with canonical protein-coding genes, the great majority of the yeast noncanonical translatome appeared evolutionarily transient and showed no signatures of selection. Yet, we uncovered phenotypes for 53% of a representative subset of evolutionarily transient translated elements. The iRibo framework and reference translatome described here provide a foundation for further investigation of a largely unexplored, but biologically significant, evolutionarily transient translatome.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Mikhail Pomaznoy ◽  
Ashu Sethi ◽  
Jason Greenbaum ◽  
Bjoern Peters

Abstract RNA-seq methods are widely utilized for transcriptomic profiling of biological samples. However, there are known caveats of this technology which can skew the gene expression estimates. Specifically, if the library preparation protocol does not retain RNA strand information then some genes can be erroneously quantitated. Although strand-specific protocols have been established, a significant portion of RNA-seq data is generated in non-strand-specific manner. We used a comprehensive stranded RNA-seq dataset of 15 blood cell types to identify genes for which expression would be erroneously estimated if strand information was not available. We found that about 10% of all genes and 2.5% of protein coding genes have a two-fold or higher difference in estimated expression when strand information of the reads was ignored. We used parameters of read alignments of these genes to construct a machine learning model that can identify which genes in an unstranded dataset might have incorrect expression estimates and which ones do not. We also show that differential expression analysis of genes with biased expression estimates in unstranded read data can be recovered by limiting the reads considered to those which span exonic boundaries. The resulting approach is implemented as a package available at https://github.com/mikpom/uslcount.


2019 ◽  
Vol 116 (44) ◽  
pp. 22020-22029 ◽  
Author(s):  
Aritro Nath ◽  
Eunice Y. T. Lau ◽  
Adam M. Lee ◽  
Paul Geeleher ◽  
William C. S. Cho ◽  
...  

Large-scale cancer cell line screens have identified thousands of protein-coding genes (PCGs) as biomarkers of anticancer drug response. However, systematic evaluation of long noncoding RNAs (lncRNAs) as pharmacogenomic biomarkers has so far proven challenging. Here, we study the contribution of lncRNAs as drug response predictors beyond spurious associations driven by correlations with proximal PCGs, tissue lineage, or established biomarkers. We show that, as a whole, the lncRNA transcriptome is equally potent as the PCG transcriptome at predicting response to hundreds of anticancer drugs. Analysis of individual lncRNAs transcripts associated with drug response reveals nearly half of the significant associations are in fact attributable to proximal cis-PCGs. However, adjusting for effects of cis-PCGs revealed significant lncRNAs that augment drug response predictions for most drugs, including those with well-established clinical biomarkers. In addition, we identify lncRNA-specific somatic alterations associated with drug response by adopting a statistical approach to determine lncRNAs carrying somatic mutations that undergo positive selection in cancer cells. Lastly, we experimentally demonstrate that 2 lncRNAs, EGFR-AS1 and MIR205HG, are functionally relevant predictors of anti-epidermal growth factor receptor (EGFR) drug response.


RNA Biology ◽  
2014 ◽  
Vol 11 (11) ◽  
pp. 1414-1429 ◽  
Author(s):  
Firoz Ahmed ◽  
Muthappa Senthil-Kumar ◽  
Seonghee Lee ◽  
Xinbin Dai ◽  
Kirankumar S Mysore ◽  
...  

RNA ◽  
2015 ◽  
Vol 21 (6) ◽  
pp. 1085-1095 ◽  
Author(s):  
Lorena Pantano ◽  
Meritxell Jodar ◽  
Mads Bak ◽  
Josep Lluís Ballescà ◽  
Niels Tommerup ◽  
...  

2018 ◽  
Vol 49 (6) ◽  
Author(s):  
Elsahookie & et al.

The endosperm in cereals supplies nutrients to the developing kernel and seedling, and it is the primary tissue that gene imprinting occurs. Developing maize (Zea mays L.) endosperms were analysed for allelic gene expression in both reciprocal crosses of inbreds B73 and Mo17. A high-throughput transcriptome sequencing in kernels at 0, 3 up to 15 DAP of both reciprocals were performed, and found a gradual increased paternal transcript expression in 3 and 5 DAP kernels. Meanwhile, in 7 DAP endosperm, most of genes tested gave the ratio 2:1 maternal: paternal, suggesting that paternal genes are almost fully activated at 7 DAP. There were 300 PEGs and 499 MEGs identified across endosperm development stages. A 63 genes out of 116, 234 exhibited parent-specific expression were identified at 7, 10 and 15 DAP. Most of paternally expressed genes was at 7 DAP due to deviation of paternal alleles expression at this stage of development. Imprinted genes in terms of relative expression of maternal and paternal alleles differed at least five folds in both crosses. A total of 179 (1.6%) protein coding genes expressed in the endosperm were imprinted, 68 of them showed maternal preferential expression and 111 paternal expression, besides 38 long noncoding RNA were found imprinted and transcribed in either sense or antisense direction from intronic regions of normal protein coding genes or from intergenic regions. Imprinted genes showed clustering around the genome. A total of 21 imprinted  genes in the maize hybrid endosperm had differentially methylated regions (DMRs). All DMRs were found to be hypomethylated in maternal alleles and hypermethylated in paternal alleles. These results confirm a complex mechanism controlling endosperm in maize in imprinting, auxin activity, and development regulation. Studying F2 kernels on F1 plants may shed a new light on controlling kernel number weight in unit of area.


Sign in / Sign up

Export Citation Format

Share Document