scholarly journals iREAD: A Tool for Intron Retention Detection from RNA-seq Data

2017 ◽  
Author(s):  
Hong-Dong Li ◽  
Cory C. Funk ◽  
Nathan D. Price

AbstractSummaryDetecting intron retention (IR) events is emerging as a specialized need for RNA-seq data analysis. Here we present iREAD (intron REtention Analysis and Detector), a tool to detect IR events genome-wide from high-throughput RNA-seq data. The command line interface for iREAD is implemented in Python. iREAD takes as input an existing BAM file, representing the transcriptome, and a text file containing the intron coordinates of a genome. It then 1) counts all reads that overlap intron regions, 2) detects IR vents by analyzing features of reads such as depth and distribution patterns, and 3) outputs a list of retained introns into a tab-delimited text file. The output can be directly used for further exploratory analysis such as differential intron expression and functional enrichment. iREAD provides a new and generic tool to interrogate poly-A enriched transcriptomic data of intron regions.Availabilitywww.libpls.net/[email protected]

2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Ying Mao ◽  
Peng Huang ◽  
Yan Wang ◽  
Maiqiu Wang ◽  
Ming D. Li ◽  
...  

Abstract Background Smoking is a major causal risk factor for lung cancer, chronic obstructive pulmonary disease (COPD), cardiovascular disease (CVD), and is the main preventable cause of deaths in the world. The components of cigarette smoke are involved in immune and inflammatory processes, which may increase the prevalence of cigarette smoke-related diseases. However, the underlying molecular mechanisms linking smoking and diseases have not been well explored. This study was aimed to depict a global map of DNA methylation and gene expression changes induced by tobacco smoking and to explore the molecular mechanisms between smoking and human diseases through whole-genome bisulfite sequencing (WGBS) and RNA-sequencing (RNA-seq). Results We performed WGBS on 72 samples (36 smokers and 36 nonsmokers) and RNA-seq on 75 samples (38 smokers and 37 nonsmokers), and cytokine immunoassay on plasma from 22 males (9 smokers and 13 nonsmokers) who were recruited from the city of Jincheng in China. By comparing the data of the two groups, we discovered a genome-wide methylation landscape of differentially methylated regions (DMRs) associated with smoking. Functional enrichment analyses revealed that both smoking-related hyper-DMR genes (DMGs) and hypo-DMGs were related to synapse-related pathways, whereas the hypo-DMGs were specifically related to cancer and addiction. The differentially expressed genes (DEGs) revealed by RNA-seq analysis were significantly enriched in the “immunosuppression” pathway. Correlation analysis of DMRs with their corresponding gene expression showed that genes affected by tobacco smoking were mostly related to immune system diseases. Finally, by comparing cytokine concentrations between smokers and nonsmokers, we found that vascular endothelial growth factor (VEGF) was significantly upregulated in smokers. Conclusions In sum, we found that smoking-induced DMRs have different distribution patterns in hypermethylated and hypomethylated areas between smokers and nonsmokers. We further identified and verified smoking-related DMGs and DEGs through multi-omics integration analysis of DNA methylome and transcriptome data. These findings provide us a comprehensive genomic map of the molecular changes induced by smoking which would enhance our understanding of the harms of smoking and its relationship with diseases.


2018 ◽  
Author(s):  
Marilyn Parra ◽  
Ben W. Booth ◽  
Richard Weiszmann ◽  
Brian Yee ◽  
Gene W. Yeo ◽  
...  

AbstractDuring terminal erythropoiesis, the splicing machinery in differentiating erythroblasts executes a robust intron retention (IR) program that impacts expression of hundreds of genes. We studied IR mechanisms in the SF3B1 splicing factor gene, which expresses ~50% of its transcripts in late erythroblasts as a nuclear isoform that retains intron 4. RNA-seq analysis of nonsense-mediated decay (NMD)-inhibited cells revealed previously undescribed splice junctions, rare or not detected in normal cells, that connect constitutive exons 4 and 5 to highly conserved cryptic cassette exons within the intron. Minigene splicing reporter assays showed that these cassettes promote IR. Genome-wide analysis of splice junction reads demonstrated that cryptic noncoding cassettes are much more common in large (>1kb) retained introns than they are in small retained introns or in non-retained introns. Functional assays showed that heterologous cassettes can promote retention of intron 4 in the SF3B1 splicing reporter. Although many of these cryptic exons were spliced inefficiently, they exhibited substantial binding of U2AF1 and U2AF2 adjacent to their splice acceptor sites. We propose that these exons function as decoys that engage the intron-terminal splice sites, blocking cross-intron interactions required for excision. Developmental regulation of decoy function underlies a major component of the erythroblast IR program.


2021 ◽  
Author(s):  
Tanzeem Fatima ◽  
Rangachari Krishnan ◽  
Ashutosh Srivastava ◽  
Vageeshbabu S. Hanur ◽  
M. Srinivasa Rao

East Indian Sandalwood (Santalum album L.) is highly valued for its heartwood and its oil. There have been no efforts to comparative study of high and low oil yielding genetically identical sandalwood trees grown in similar climatic condition. Thus we intend to study a genome wide transcriptome analysis to identify the corresponding genes involved in high oil biosynthesis in S. album. In this study, 15 years old S. album (SaSHc and SaSLc) genotypes were targeted for analysis to understand the contribution of genetic background on high oil biosynthesis in S. album. A total of 28,959187 and 25,598869 raw PE reads were generated by the Illumina sequencing. 2.12 million and 1.811 million coding sequences were obtained in respective accessions. Based on the GO terms, functional classification of the CDS 21262, & 18113 were assigned into 26 functional groups of three GO categories; (4,168; 3,641) for biological process (5,758;4,971) cellular component and (5,108;4,441) for molecular functions. Total 41,900 and 36,571 genes were functionally annotated and KEGG pathways of the DEGs resulted 213 metabolic pathways. In this, 14 pathways were involved in secondary metabolites biosynthesis pathway in S. album. Among 237 cytochrome families, nine groups of cytochromes were participated in high oil biosynthesis. 16,665 differentially expressed genes were commonly detected in both the accessions (SaHc and SaSLc). The results showed that 784 genes were upregulated and 339 genes were downregulated in SaHc whilst 635 upregulated 299 downregulated in SaSLc S. album. RNA-Seq results were further validated by quantitative RT-PCR. Maximum Blast hits were found to be against Vitis vinifera. From this study we have identified additional number of cytochrome family in SaHc. The accessibility of a RNA-Seq for high oil yielding sandalwood accessions will have broader associations for the conservation and selection of superior elite samples/populations for further genetic improvement program.


2020 ◽  
Vol 21 (15) ◽  
pp. 5492 ◽  
Author(s):  
Yu Jin Jung ◽  
Jong Hee Kim ◽  
Hyo Ju Lee ◽  
Dong Hyun Kim ◽  
Jihyeon Yu ◽  
...  

The rice SLR1 gene encodes the DELLA protein (protein with DELLA amino acid motif), and a loss-of-function mutation is dwarfed by inhibiting plant growth. We generate slr1-d mutants with a semi-dominant dwarf phenotype to target mutations of the DELLA/TVHYNP domain using CRISPR/Cas9 genome editing in rice. Sixteen genetic edited lines out of 31 transgenic plants were generated. Deep sequencing results showed that the mutants had six different mutation types at the target site of the TVHYNP domain of the SLR1 gene. The homo-edited plants selected individuals without DNA (T-DNA) transcribed by segregation in the T1 generation. The slr1-d7 and slr1-d8 plants caused a gibberellin (GA)-insensitive dwarf phenotype with shrunken leaves and shortened internodes. A genome-wide gene expression analysis by RNA-seq indicated that the expression levels of two GA-related genes, GA20OX2 (Gibberellin oxidase) and GA3OX2, were increased in the edited mutant plants, suggesting that GA20OX2 acts as a convert of GA12 signaling. These mutant plants are required by altering GA responses, at least partially by a defect in the phytohormone signaling system process and prevented cell elongation. The new mutants, namely, the slr1-d7 and slr1-d8 lines, are valuable semi-dominant dwarf alleles with potential application value for molecule breeding using the CRISPR/Cas9 system in rice.


2019 ◽  
Author(s):  
Xu-Kai Ma ◽  
Meng-Ran Wang ◽  
Chu-Xiao Liu ◽  
Rui Dong ◽  
Gordon G. Carmichael ◽  
...  

ABSTRACTSequences of circular RNAs (circRNAs) produced from back-splicing of exon(s) completely overlap with sequences from cognate linear RNAs transcribed from the same gene loci with the exception of their back-splicing junction (BSJ) sites. Examination of global circRNA expression from RNA-seq datasets generally relies on the detection of RNA-seq fragments spanning BSJ sites, but a direct comparison of circular and linear RNA expression from the same gene loci in a genome-wide manner has remained challenging. This is because quantification of BSJ fragments differs from that of linear RNA expression that uses normalized RNA-seq fragments mapped to the whole gene bodies. Here, we have developed a computational pipeline for circular and linear RNA expression analysis from ribosomal-RNA depleted RNA-seq (CLEAR, https://github.com/YangLab/CLEAR). A new quantitation parameter, FPB (fragments per billion mapped bases), is applied to evaluate circular and linear RNA expression individually by fragments mapped to circRNA-specific BSJ sites or to linear RNA-specific splicing junction (SJ) sites. Then, circular and linear RNA expression are directly compared by dividing FPBcirc by FPBlinear to generate a CIRCscore, which indicates the relative circRNA expression using linear RNA expression as the background. Highly-expressed circRNAs with low cognate linear RNA expression background can be identified for further investigation.


2014 ◽  
Author(s):  
Konrad Ulrich Förstner ◽  
Jörg Vogel ◽  
Cynthia Mira Sharma

Summary: RNA-Seq has become a potent and widely used method to qualitatively and quantitatively study transcriptomes. In order to draw biological conclusions based on RNA-Seq data, several steps some of which are computationally intensive, have to betaken. Our READemption pipeline takes care of these individual tasks and integrates them into an easy-to-use tool with a command line interface. To leverage the full power of modern computers, most subcommands of READemption offer parallel data processing. While READemption was mainly developed for the analysis of bacterial primary transcriptomes, we have successfully applied it to analyze RNA-Seq reads from other sample types, including whole transcriptomes, RNA immunoprecipitated with proteins, not only from bacteria, but also from eukaryotes and archaea. Availability and Implementation: READemption is implemented in Python and is published under the ISC open source license. The tool and documentation is hosted at http://pythonhosted.org/READemption (DOI:10.6084/m9.figshare.977849).


2020 ◽  
Author(s):  
PENG MA ◽  
Xiao Zhang ◽  
Bowen Luo ◽  
Zhen Chen ◽  
Xuan He ◽  
...  

Abstract Background: Long noncoding RNAs (lncRNAs) play important roles in essential biological processes. However, our understanding of lncRNAs as competing endogenous RNAs (ceRNAs) and their responses to nitrogen stress is still limited.Results: Here, we surveyed the lncRNAs and miRNAs in maize inbred line P178 leaves and roots at the seedling stage under high-nitrogen and low-nitrogen conditions using lncRNA-Seq and small RNA-Seq. A total of 894 differentially expressed lncRNAs and 38 different miRNAs were identified. Co-expression analysis found two lncRNAs and four lncRNA-targets could competitively combine with ZmmiR159 and ZmmiR164, respectively. To dissect the genetic regulatory by which lncRNAs might enable adaptation to limited nitrogen availability. An association mapping panel containing a high-density single–nucleotide polymorphism (SNP) array (56,110 SNPs) combined with variable LN resistance-related phenotypes obtained from hydroponics was used for a genome-wide association study (GWAS). By combining GWAS and RNA-Seq, 170 differently expressed lncRNAs within the range of significant markers were screened. Moreover, 40 consistently LN-responsive genes including those involved in glutamine biosynthesis and nitrogen acquisition in root were identified. Transient expression assays in Nicotiana benthamiana demonstrated LNC_002923 could inhabit ZmmiR159-guided cleavage of Zm00001d015521. Conclusions: These lncRNAs containing trait-associated significant SNPs could consider to be related to root development and nutrient utilization. Taken together, the results of our study can provide new insights into the potential regulatory roles of lncRNAs in response to LN stress, and give valuable information for further screening of candidates as well as the improvement of maize regarding LN-responsive resistance.


2021 ◽  
Author(s):  
Nicolas Eugenie ◽  
Yvan Zivanovic ◽  
Gaelle Lelandais ◽  
Genevieve Coste ◽  
Claire Bouthier de la Tour ◽  
...  

Numerous genes are overexpressed in the radioresistant bacterium Deinococcus radiodurans after exposure to radiation or prolonged desiccation. The DdrO and IrrE proteins play a major role in regulating the expression of approximately predicted twenty of these genes. The transcriptional repressor DdrO blocks the expression of these genes under normal growth conditions. After exposure to genotoxic agents, the IrrE metalloprotease cleaves DdrO and relieves gene repression. Bioinformatic analyzes showed that this mechanism seems to be conserved in several species of Deinococcus, but many questions remain as such the number of genes regulated by DdrO. Here, by RNA-seq and CHiP-seq assays performed at a genome-wide scale coupled with bioinformatic analyses, we show that, the DdrO regulon in D. radiodurans includes many other genes than those previously described. These results thus pave the way to better understand the radioresistance mechanisms encoded by this bacterium.


2021 ◽  
Vol 17 ◽  
pp. 117693432110413
Author(s):  
Chaoxin Zhang ◽  
Tao Wang ◽  
Tongyan Cui ◽  
Shengwei Liu ◽  
Bing Zhang ◽  
...  

The CCAAT/enhancer binding protein (C/EBP) transcription factors (TFs) regulate many important biological processes, such as energy metabolism, inflammation, cell proliferation etc. A genome-wide gene identification revealed the presence of a total of 99 C/EBP genes in pig and 19 eukaryote genomes. Phylogenetic analysis showed that all C/EBP TFs were classified into 6 subgroups named C/EBPα, C/EBPβ, C/EBPδ, C/EBPε, C/EBPγ, and C/EBPζ. Gene expression analysis showed that the C/EBPα, C/EBPβ, C/EBPδ, C/EBPγ, and C/EBPζ genes were expressed ubiquitously with inconsistent expression patterns in various pig tissues. Moreover, a pig C/EBP regulatory network was constructed, including C/EBP genes, TFs and miRNAs. A total of 27 feed-forward loop (FFL) motifs were detected in the pig C/EBP regulatory network. Based on the RNA-seq data, gene expression patterns related to FFL sub-network were analyzed in 27 adult pig tissues. Certain FFL motifs may be tissue specific. Functional enrichment analysis indicated that C/EBP and its target genes are involved in many important biological pathways. These results provide valuable information that clarifies the evolutionary relationships of the C/EBP family and contributes to the understanding of the biological function of C/EBP genes.


2021 ◽  
Vol 12 ◽  
Author(s):  
Xuhao Song ◽  
Tingbang Yang ◽  
Xinyi Zhang ◽  
Ying Yuan ◽  
Xianghui Yan ◽  
...  

Microsatellite or simple sequence repeat (SSR) instability within genes can induce genetic variation. The SSR signatures remain largely unknown in different clades within Euarchontoglires, one of the most successful mammalian radiations. Here, we conducted a genome-wide characterization of microsatellite distribution patterns at different taxonomic levels in 153 Euarchontoglires genomes. Our results showed that the abundance and density of the SSRs were significantly positively correlated with primate genome size, but no significant relationship with the genome size of rodents was found. Furthermore, a higher level of complexity for perfect SSR (P-SSR) attributes was observed in rodents than in primates. The most frequent type of P-SSR was the mononucleotide P-SSR in the genomes of primates, tree shrews, and colugos, while mononucleotide or dinucleotide motif types were dominant in the genomes of rodents and lagomorphs. Furthermore, (A)n was the most abundant motif in primate genomes, but (A)n, (AC)n, or (AG)n was the most abundant motif in rodent genomes which even varied within the same genus. The GC content and the repeat copy numbers of P-SSRs varied in different species when compared at different taxonomic levels, reflecting underlying differences in SSR mutation processes. Notably, the CDSs containing P-SSRs were categorized by functions and pathways using Gene Ontology and Kyoto Encyclopedia of Genes and Genomes annotations, highlighting their roles in transcription regulation. Generally, this work will aid future studies of the functional roles of the taxonomic features of microsatellites during the evolution of mammals in Euarchontoglires.


Sign in / Sign up

Export Citation Format

Share Document