scholarly journals Comparing alternative pipelines for cross-platform microarray gene expression data integration with RNA-seq data in breast cancer

2016 ◽  
Author(s):  
Alina Frolova ◽  
Vladyslav Bondarenko ◽  
Maria Obolenska

AbstractBackgroundAccording to major public repositories statistics an overwhelming majority of the existing and newly uploaded data originates from microarray experiments. Unfortunately, the potential of this data to bring new insights is limited by the effects of individual study-specific biases due to small number of biological samples. Increasing sample size by direct microarray data integration increases the statistical power to obtain a more precise estimate of gene expression in a population of individuals resulting in lower false discovery rates. However, despite numerous recommendations for gene expression data integration, there is a lack of a systematic comparison of different processing approaches aimed to asses microarray platforms diversity and ambiguous probesets to genes correspondence, leading to low number of studies applying integration.ResultsHere, we investigated five different approaches of the microarrays data processing in comparison with RNA-seq data on breast cancer samples. We aimed to evaluate different probesets annotations as well as different procedures of choosing between probesets mapped to the same gene. We show that pipelines rankings are mostly preserved across Affymetrix and Illumina platforms. BrainArray approach based on updated annotation and redesigned probesets definition and choosing probeset with the maximum average signal across the samples have best correlation with RNA-seq, while averaging probesets signals as well as scoring the quality of probes sequences mapping to the transcripts of the targeted gene have worse correlation. Finally, randomly selecting probeset among probesets mapped to the same gene significantly decreases the correlation with RNA-seq.ConclusionWe show that methods, which rely on actual probesets signal intensities, are advantageous to methods considering biological characteristics of the probes sequences only and that cross-platform integration of datasets improves correlation with the RNA-seq data. We consider the results obtained in this paper contributive to the integrative analysis as a worthwhile alternative to the classical meta-analysis of the multiple gene expression datasets.

Blood ◽  
2006 ◽  
Vol 108 (11) ◽  
pp. 2232-2232
Author(s):  
Serban San-Marina ◽  
Fernando Suarez Saiz ◽  
Haytham Khoury ◽  
Mark D. Minden

Abstract In leukemia, the integrity of the transcriptome is altered by chromosomal translocations, deletions, duplications, as well as by epigenetic changes in chromatin structure. By targeting mRNAs for translational repression or RNase-dependent hydrolysis (AU-rich miRNAs or shRNA-like effects), the micro RNA (miRNA) component of the transcriptome is estimated to regulate expression of up to 30% of all proteins. Yet the causes and role of deregulated miRNA expression in malignancy are largely unknown, in part because promoter events are not characterized. Since more than one-third of all known mammalian miRNA genes are encoded in the introns of protein-coding genes they may be regulated by the same promoter events that regulate host-gene mRNA expression. To provide experimental validation for coordinated expression of miRNAs and their host genes we compared Affymetrix U133A gene expression data for the promyelocytic NB4 and acute myelogenous leukemia AML2 cell lines with the expression of miRNA precursors. We found similar patterns of host gene expression in the two cell lines and a good correlation with the expression of miRNA precursors in NB4 cells (r=0.464, N=30 miRNAs, p<0.016). To further demonstrate that host gene mRNAs and miRNAs are expressed from common transcripts, we activated promoter events by enforcing the expression of Lyl1 a basic helix-loop-helix transcription factor that is often over-expressed in AML. This resulted in a greater than 2-fold increase in hsa-mir-126-1, 032-2, 107-1, 026a, -023b, -103-2, and 009-3-1 intronic miRNA precursors and a corresponding increase in host gene expression. Meta-analysis of microarray data across many experiments and platforms (available through Oncomine.org) has been used to study the cancer transcriptome. To help determine if intronic miRNAs play a substantial role in malignancy, we correlated host gene expression data with the expression of predicted miRNA targets. Less than 20% of all differentially expressed genes in leukemia and lymphoma were predicted targets, compared to 68% in breast cancer. Since the Gene Ontology term “ion binding” is most commonly associated with miRNA host genes, the data suggest that this cancer module is relatively inactive in leukemia and lymphoma, compared to breast cancer. Gene cluster analysis of a leukemia data set using only miRNA host gene expression was able to classify patients into similar (but not identical) subsets as did an analysis based on over 20,000 transcripts. To further demonstrate that miRNAs and their host genes are expressed from the same transcription unit, we correlated the expression of miRNA targets with that of genes that are either hosts for miRNAs or are situated several kilobases downstream of a miRNA, and thus belong to different transcription units. We applied this analysis to a subset of 81 AML patients that presented a normal karyotype and found significant negative correlations (p<0.01) between the levels of host genes for hsa-mir-15b, -103-1, and -128 and the expression ranks of their predicted gene targets, but no statistically significant correlation between non-host genes and targets for up-stream miRNAs. These data demonstrate co-regulated expression of host genes and intronic miRNAs and the usefulness of intronic miRNAs in cancer profiling.


2012 ◽  
Vol 14 (4) ◽  
pp. 469-490 ◽  
Author(s):  
C. Lazar ◽  
S. Meganck ◽  
J. Taminau ◽  
D. Steenhoff ◽  
A. Coletta ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document