scholarly journals Assessing Computational Steps for CLIP-Seq Data Analysis

2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Qi Liu ◽  
Xue Zhong ◽  
Blair B. Madison ◽  
Anil K. Rustgi ◽  
Yu Shyr

RNA-binding protein (RBP) is a key player in regulating gene expression at the posttranscriptional level. CLIP-Seq, with the ability to provide a genome-wide map of protein-RNA interactions, has been increasingly used to decipher RBP-mediated posttranscriptional regulation. Generating highly reliable binding sites from CLIP-Seq requires not only stringent library preparation but also considerable computational efforts. Here we presented a first systematic evaluation of major computational steps for identifying RBP binding sites from CLIP-Seq data, including preprocessing, the choice of control samples, peak normalization, and motif discovery. We found that avoiding PCR amplification artifacts, normalizing to input RNA or mRNAseq, and defining the background model from control samples can reduce the bias introduced by RNA abundance and improve the quality of detected binding sites. Our findings can serve as a general guideline for CLIP experiments design and the comprehensive analysis of CLIP-Seq data.

Circulation ◽  
2014 ◽  
Vol 130 (suppl_2) ◽  
Author(s):  
Jennifer Davis ◽  
Michelle Sargent ◽  
Jianjian Shi ◽  
Lei Wei ◽  
Maurice S Swanson ◽  
...  

Rationale: During the cardiac injury response fibroblasts differentiate into myofibroblasts, a cell type that enhances extracellular matrix production and facilitates ventricular remodeling. To better understand the molecular mechanisms whereby myofibroblasts are generated in the heart we performed a genome-wide screen with 18,000 cDNAs, which identified the RNA-binding protein muscleblind-like splicing regulator 1 (MBNL1), suggesting a novel association between mRNA alternative splicing and the regulation of myofibroblast differentiation. Objective: To determine the mechanism whereby MBNL1 regulates myofibroblast differentiation and the cardiac fibrotic response. Methods and Results: Confirming the results from our genome wide screen, adenoviral-mediated overexpression of MBNL1 promoted transformation of rat cardiac fibroblasts and mouse embryonic fibroblasts (MEFs) into myofibroblasts, similar to the level of conversion obtained by the profibrotic agonist transforming growth factor β (TGFβ). Antithetically, Mbnl1 -/- MEFs were refractory to TGFβ-induced myofibroblast differentiation. MBNL1 expression is induced in transforming fibroblasts in response to TGFβ and angiotensin II. These results were extended in vivo by analysis of dermal wound healing, a process dependent on myofibroblast differentiation and their proper activity. By day 6 control mice had achieved 82% skin wound closure compared with only 40% in Mbnl1 -/- mice. Moreover, Mbnl1 -/- mice had reduced survival following myocardial infarction injury due to defective fibrotic scar formation and healing. High throughput RNA sequencing (RNAseq) and RNA immunoprecipitation revealed that MBNL1 directly regulates the alternative splicing of transcripts for myofibroblast signaling factors and cytoskeletal-assembly elements. Functional analysis of these factors as mediators of MBNL1 activity is also described here. Conclusions: Collectively, our data suggest that MBNL1 coordinates myofibroblast transformation by directly mediating the alternative splicing of an array of mRNAs encoding differentiation-specific signaling transcripts, which then alter the fibroblast proteome for myofibroblast structure and function.


Blood ◽  
2008 ◽  
Vol 111 (3) ◽  
pp. 1217-1226 ◽  
Author(s):  
Yun Chen ◽  
David H. Gorski

Abstract Angiogenesis is critical to tumor progression. The homeobox gene GAX inhibits angiogenesis in vascular endothelial cells (ECs). We have identified a microRNA (miR-130a) that regulates GAX expression and hypothesized that it plays a major role in modulating GAX activity in ECs. A 280-bp fragment from the GAX 3′-untranslated region (3′-UTR) containing 2 miR-130a targeting sites was observed to be required for the rapid down-regulation of GAX expression by serum and proangiogenic factors, whereas the activity of the GAX promoter did not vary with exposure to serum or proangiogenic factors. This same 280-bp sequence in the GAX 3′-UTR cloned into the psiCHECK2-Luciferase vector mediated serum-induced down-regulation of the reporter gene when placed 3′ of it. Finally, forced expression of miR-130a inhibits GAX expression through this specific GAX 3′-UTR sequence. A genome-wide search for other possible miR-130a binding sites revealed an miR-130a targeting site in the 3′-UTR of the antiangiogenic homeobox gene HOXA5, the expression and antiangiogenic activity of which are also inhibited by miR-130a. From these data, we conclude that miR-130a is a regulator of the angiogenic phenotype of vascular ECs largely through its ability to modulate the expression of GAX and HOXA5.


Entropy ◽  
2019 ◽  
Vol 21 (8) ◽  
pp. 802
Author(s):  
Chun-xiao Sun ◽  
Yu Yang ◽  
Hua Wang ◽  
Wen-hu Wang

Chromatin immunoprecipitation combined with next-generation sequencing (ChIP-Seq) technology has enabled the identification of transcription factor binding sites (TFBSs) on a genome-wide scale. To effectively and efficiently discover TFBSs in the thousand or more DNA sequences generated by a ChIP-Seq data set, we propose a new algorithm named AP-ChIP. First, we set two thresholds based on probabilistic analysis to construct and further filter the cluster subsets. Then, we use Affinity Propagation (AP) clustering on the candidate cluster subsets to find the potential motifs. Experimental results on simulated data show that the AP-ChIP algorithm is able to make an almost accurate prediction of TFBSs in a reasonable time. Also, the validity of the AP-ChIP algorithm is tested on a real ChIP-Seq data set.


Genes ◽  
2019 ◽  
Vol 10 (3) ◽  
pp. 209 ◽  
Author(s):  
Elizaveta Radion ◽  
Olesya Sokolova ◽  
Sergei Ryazansky ◽  
Pavel Komarov ◽  
Yuri Abramov ◽  
...  

Piwi-interacting RNAs (piRNAs) control transposable element (TE) activity in the germline. piRNAs are produced from single-stranded precursors transcribed from distinct genomic loci, enriched by TE fragments and termed piRNA clusters. The specific chromatin organization and transcriptional regulation of Drosophila germline-specific piRNA clusters ensure transcription and processing of piRNA precursors. TEs harbour various regulatory elements that could affect piRNA cluster integrity. One of such elements is the suppressor-of-hairy-wing (Su(Hw))-mediated insulator, which is harboured in the retrotransposon gypsy. To understand how insulators contribute to piRNA cluster activity, we studied the effects of transgenes containing gypsy insulators on local organization of endogenous piRNA clusters. We show that transgene insertions interfere with piRNA precursor transcription, small RNA production and the formation of piRNA cluster-specific chromatin, a hallmark of which is Rhino, the germline homolog of the heterochromatin protein 1 (HP1). The mutations of Su(Hw) restored the integrity of piRNA clusters in transgenic strains. Surprisingly, Su(Hw) depletion enhanced the production of piRNAs by the domesticated telomeric retrotransposon TART, indicating that Su(Hw)-dependent elements protect TART transcripts from piRNA processing machinery in telomeres. A genome-wide analysis revealed that Su(Hw)-binding sites are depleted in endogenous germline piRNA clusters, suggesting that their functional integrity is under strict evolutionary constraints.


2013 ◽  
Vol 203 (1) ◽  
pp. 149-164 ◽  
Author(s):  
Ka-wing Fong ◽  
Yujing Li ◽  
Wenqi Wang ◽  
Wenbin Ma ◽  
Kunpeng Li ◽  
...  

The nucleus is a unique organelle that contains essential genetic materials in chromosome territories. The interchromatin space is composed of nuclear subcompartments, which are defined by several distinctive nuclear bodies believed to be factories of DNA or RNA processing and sites of transcriptional and/or posttranscriptional regulation. In this paper, we performed a genome-wide microscopy-based screening for proteins that form nuclear foci and characterized their localizations using markers of known nuclear bodies. In total, we identified 325 proteins localized to distinct nuclear bodies, including nucleoli (148), promyelocytic leukemia nuclear bodies (38), nuclear speckles (27), paraspeckles (24), Cajal bodies (17), Sam68 nuclear bodies (5), Polycomb bodies (2), and uncharacterized nuclear bodies (64). Functional validation revealed several proteins potentially involved in the assembly of Cajal bodies and paraspeckles. Together, these data establish the first atlas of human proteins in different nuclear bodies and provide key information for research on nuclear bodies.


Blood ◽  
2008 ◽  
Vol 112 (11) ◽  
pp. 283-283
Author(s):  
Andre M. Pilon ◽  
Elliott H. Margulies ◽  
Hatice Ozel Abaan ◽  
Amy Werner- Allen ◽  
Tim M. Townes ◽  
...  

Abstract Erythroid Kruppel-Like Factor (EKLF; KLF1) is the founding member of the Kruppel family of transcription factors, with 3 C2H2 zinc-fingers that bind a 9-base consensus sequence (NCNCNCCCN). The functions of EKLF, first identified as an activator of the beta-globin locus, include gene activation and chromatin remodeling. Our knowledge of genes regulated by EKLF is limited, as EKLF-deficient mice die by embryonic day 15 (E15), due to a severe anemia. Analysis of E13.5 wild type and EKLF-deficient fetal liver (FL) erythroid cells revealed that EKLF-deficient cells fail to complete terminal erythroid maturation (Pilon et al. submitted). Coupling chromatin immunoprecipitation and ultra high-throughput massively parallel sequencing (ChIP-seq) is increasingly being used for mapping protein-DNA interactions in vivo on a genome-wide scale. ChIP-seq allows a simultaneous analysis of transcription factor binding in every region of the genome, defining an “interactome”. To elucidate direct EKLF-dependent effects on erythropoiesis, we have combined ChIP-seq with expression array (“transcriptome”) analyses. We feel that integration of ChIP-seq and microarray data can provide us detailed knowledge of the role of EKLF in erythropoiesis. Chromatin was isolated from E13.5 FL cells of mice whose endogenous EKLF gene was replaced with a fully functional HA-tagged EKLF gene. ChIP was performed using a highly specific high affinity anti-HA antibody. A library of EKLF-bound FL chromatin enriched by anti-HA IP was created and subjected to fluorescent in situ sequencing on a Solexa 1G platform, providing 36-base signatures that were mapped to unique sites in the mouse genome, defining the EKLF “interactome.” The frequency with which a given signature appears provides a measurable peak of enrichment. We performed three biological/technical replicates and analyzed each data set individually as well as the combined data. To validate ChIP-seq results, we examined the locus of a known EKLF target gene, a-hemoglobin stabilizing protein (AHSP). Peaks corresponded to previously identified DNase hypersensitive sites, regions of histone hyperacetylation, and sites of promoter-occupancy determined by ChIP-PCR. A genome wide analysis, focusing on the regions with the highest EKLF occupancy revealed a set of 531 locations where high levels EKLF binding occurs. Of these sites, 119 (22%) are located 10 kb or more from the nearest gene and are classified as intergenic EKLF binding sites. Another 78 sites (14.6%) are within 10 kb of an annotated RefSeq gene. A plurality of the binding sites, 222 (42%), are within RefSeq coordinates and are classified as intragenic EKLF binding sites. Microarray profiling of mRNA from sorted, matched populations of dE13.5 WT and EKLF-deficient FL erythroid progenitor cells showed dysregulation of >3000 genes (p<0.05). Ingenuity Pathways Analysis (IPA) of the >3000 dysregulated mRNAs indicated significant alteration of a cell cycle-control network, centered about the transcription factor, E2f2. We confirmed significantly decreased E2f2 mRNA and protein levels by real-time PCR and Western blot, respectively; demonstrated that EKLF-deficient FL cells accumulate in G0/G1 by cell cycle analysis; and verified EKLF-binding to motifs within the E2f2 promoter by ChIP-PCR and analysis of the ChIP Seq data. We hypothesized that only a subset of the 3000 dysregulated genes would be direct EKLF targets. We limited the ChIP-seq library to display the top 5% most frequently represented fragments across the genome, and applied this criterion to the network of dysregulated mRNAs in the IPA cell cycle network. ChIP-seq identified peaks of EKLF association with 60% of the loci in this pathway. However, consistent with the role of EKLF as a transcriptional activator, 95% of the occupied genomic loci corresponded to mRNAs whose expression in EKLF-deficient FL cells was significantly decreased (p<0.05). The majority (59%) of these EKLF-bound sites were located at intragenic sites (i.e., introns), while a minority (15% and 26%) were found adjacent to the genes or in intergenic regions. We have shown that both the AHSP and E2f2 loci require EKLF to cause the locus to become activated and sensitive to DNase I digestion in erythroid cells. Based on the increased frequency of intragenic EKLF-binding sites, particularly in genes of the cell cycle network, we propose that the occupancy of intragenic sites by EKLF may facilitate chromatin modification.


2017 ◽  
Vol 33 (13) ◽  
pp. 2029-2031 ◽  
Author(s):  
Yu Liu ◽  
Sha Sun ◽  
Timothy Bredy ◽  
Marcelo Wood ◽  
Robert C Spitale ◽  
...  
Keyword(s):  

2008 ◽  
Vol 28 (24) ◽  
pp. 7368-7379 ◽  
Author(s):  
Gregory S. Yochum ◽  
Ryan Cleland ◽  
Richard H. Goodman

ABSTRACT Mutations in components of the Wnt signaling pathway initiate colorectal carcinogenesis by deregulating the β-catenin transcriptional coactivator. β-Catenin activation of one target in particular, the c-Myc proto-oncogene, is required for colon cancer pathogenesis. β-Catenin is known to regulate c-Myc expression via sequences upstream of the transcription start site. Here, we report that a more robust β-catenin binding region localizes 1.4 kb downstream from the c-Myc transcriptional stop site. This site was discovered using a genome-wide method for identifying transcription factor binding sites termed serial analysis of chromatin occupancy. Chromatin immunoprecipitation-scanning assays demonstrate that the 5′ enhancer and the 3′ binding element are the only β-catenin and TCF4 binding regions across the c-Myc locus. When placed downstream of a simian virus 40-driven promoter-luciferase construct, the 3′ element activated luciferase transcription when introduced into HCT116 cells. c-Myc transcription is negligible in quiescent HCT116 cells but is induced when cells reenter the cell cycle after the addition of mitogens. Using these cells, we found that β-catenin and TCF4 occupancy at the 3′ enhancer precede occupancy at the 5′ enhancer. Association of c-Jun, β-catenin, and TCF4 specifically with the downstream enhancer underlies mitogen stimulation of c-Myc transcription. Our findings indicate that a downstream enhancer element provides the principal regulation of c-Myc expression.


Blood ◽  
2018 ◽  
Vol 132 (Supplement 1) ◽  
pp. 2758-2758
Author(s):  
Abdelrahman H Elsayed ◽  
Huiyun Wu ◽  
Xueyuan Cao ◽  
Soheil Meshinchi ◽  
Raul Ribeiro ◽  
...  

Abstract Acute myeloid leukemia (AML) treatment response remains poorly understood. Although multiple studies have focused on understanding the transcriptomic and epigenetic landscape of AML, a genome-wide analysis of SNPs in pediatric AML has not yet been investigated in depth. Thus, we sought to identify genetic variants predictive of AML response, relapse, and survival in pediatric AML patients. For this study, we generated genome-wide SNP data patients (n=160) treated on the multicenter AML02 clinical trial (ClinicalTrials.gov Identifier: NCT00136084) using Infinium Omni 2.5M Exome Beadchip. Standard GWAS QC procedure was followed in order to remove SNPs with call rate < 95%, monomorphic SNPs, SNPs with MAF<5% and samples with call rate<95%. Following QC, a risk-adjusted multi-outcome integrative GWAS was performed to identify SNPs associated with minimal residual disease (MRD) following induction I, relapse-free survival (RFS) and overall survival (OS). We performed a risk-adjusted analysis to identify 21 SNPs mapping to 14 genes at an endpoint-integrative p value <2x10-5. Table 1 provides list of genes with SNPs significantly associated with MRD, RFS, OS as well as in the integrated analysis at <2x10-5. Of interest multiple SNPs in DICER1, which is a key enzyme required for the biogenesis of microRNAs and small interfering RNAs were significantly associated with clinical outcome with promise integrated analysis at p = 0.000011, supported by associations with MRD, RFS and OS at p <0.002 (Figure 1A). DICER1 is over-expressed in AML with its expression under the influence of hematopoietic transcript factor, GATA1. RAI14, a retinoic acid induced 14 is a prognostic marker of poor response in solid tumors and has been associated with development of drug resistance. Multiple SNPs in RAI14 were significantly associated with clinical endpoints. Figure 1B shows RAI14 SNP rs336474 with C allele significantly associated with better RFS (p= 0.027) and OS (p=0.007), with an integrated p= 0.000004. SNP in upstream of RBFOX1, a RNA binding fox-1 homolog 1 and within intron of GRIN2A, glutamate ionotropic receptor NMDA type subunit 2A were significantly associated with MRD, RFS and OS (all p<0.005) and integrated p =0.00001 (Figure 1C). SNPs within genes involved in pyrimidine metabolism such as UPP2, a uridine phosphorylase; tumor suppressor genes such as JPH3, which codes for junctophilin; LILRB4 which encodes for a Leukocyte Immunoglobulin Like Receptor B4, that regulates inflammatory responses and cytotoxicity; HACE1 a potential tumor suppressor involved in the solid tumors pathophysiology; ANK2, an ankyrin family protein with role in cell proliferation and motility; BIRC8, which is implicated in CML disease progression etc. In conclusion, our results demonstrate significance of genome-wide investigation of SNPs to identify novel and clinically relevant SNPs of prognostic significance in childhood AML. We will present the in depth results of our integrated GWAS analysis as well as validation in independent patient cohorts. In summary, our results constitute one of the first integrated GWAS analyses to identify SNPs of prognostic significance in pediatric AML. Acknowledgments: We are thankful for funding from NIH R01-CA139246 and ALSAC. Disclosures No relevant conflicts of interest to declare.


Sign in / Sign up

Export Citation Format

Share Document