Impact of DNA microarray data transformation on gene expression analysis - comparison of two normalization methods.

Marcin T Schmidt; Luiza Handschuh; Joanna Zyprych; Alicja Szabelska; Agnieszka K Olejnik-Schmidt; Idzi Siatkowski; Marek Figlerowicz

doi:10.18388/abp.2011_2227

Impact of DNA microarray data transformation on gene expression analysis - comparison of two normalization methods.

Acta Biochimica Polonica ◽

10.18388/abp.2011_2227 ◽

2011 ◽

Vol 58 (4) ◽

Cited By ~ 8

Author(s):

Marcin T Schmidt ◽

Luiza Handschuh ◽

Joanna Zyprych ◽

Alicja Szabelska ◽

Agnieszka K Olejnik-Schmidt ◽

...

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Dna Microarray ◽

Microarray Data ◽

Normalization Method ◽

Differentially Expressed ◽

Microarray Data Analysis ◽

Data Set ◽

Normalization Methods ◽

The Impact

Two-color DNA microarrays are commonly used for the analysis of global gene expression. They provide information on relative abundance of thousands of mRNAs. However, the generated data need to be normalized to minimize systematic variations so that biologically significant differences can be more easily identified. A large number of normalization procedures have been proposed and many softwares for microarray data analysis are available. Here, we have applied two normalization methods (median and loess) from two packages of microarray data analysis softwares. They were examined using a sample data set. We found that the number of genes identified as differentially expressed varied significantly depending on the method applied. The obtained results, i.e. lists of differentially expressed genes, were consistent only when we used median normalization methods. Loess normalization implemented in the two software packages provided less coherent and for some probes even contradictory results. In general, our results provide an additional piece of evidence that the normalization method can profoundly influence final results of DNA microarray-based analysis. The impact of the normalization method depends greatly on the algorithm employed. Consequently, the normalization procedure must be carefully considered and optimized for each individual data set.

Download Full-text

Enhancing Interdisciplinary Mathematics and Biology Education: A Microarray Data Analysis Course Bridging These Disciplines

CBE—Life Sciences Education ◽

10.1187/cbe.09-09-0067 ◽

2010 ◽

Vol 9 (3) ◽

pp. 217-226 ◽

Cited By ~ 11

Author(s):

Yolande V. Tra ◽

Irene M. Evans

Keyword(s):

Data Analysis ◽

Microarray Data ◽

Biology Education ◽

Microarray Experiment ◽

Educational Background ◽

Microarray Data Analysis ◽

Data Set ◽

Set Up ◽

Interdisciplinary Course ◽

The Impact

BIO2010 put forth the goal of improving the mathematical educational background of biology students. The analysis and interpretation of microarray high-dimensional data can be very challenging and is best done by a statistician and a biologist working and teaching in a collaborative manner. We set up such a collaboration and designed a course on microarray data analysis. We started using Genome Consortium for Active Teaching (GCAT) materials and Microarray Genome and Clustering Tool software and added R statistical software along with Bioconductor packages. In response to student feedback, one microarray data set was fully analyzed in class, starting from preprocessing to gene discovery to pathway analysis using the latter software. A class project was to conduct a similar analysis where students analyzed their own data or data from a published journal paper. This exercise showed the impact that filtering, preprocessing, and different normalization methods had on gene inclusion in the final data set. We conclude that this course achieved its goals to equip students with skills to analyze data from a microarray experiment. We offer our insight about collaborative teaching as well as how other faculty might design and implement a similar interdisciplinary course.

Download Full-text

Impact of RNA-seq data analysis algorithms on gene expression estimation and downstream prediction

Scientific Reports ◽

10.1038/s41598-020-74567-y ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Li Tong ◽

◽

Po-Yen Wu ◽

John H. Phan ◽

Hamid R. Hassazadeh ◽

...

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Disease Outcome ◽

Rna Seq ◽

Next Generation Sequencing Technology ◽

Normalization Methods ◽

The Us ◽

Sequencing Quality ◽

Improved Accuracy ◽

The Impact

Abstract To use next-generation sequencing technology such as RNA-seq for medical and health applications, choosing proper analysis methods for biomarker identification remains a critical challenge for most users. The US Food and Drug Administration (FDA) has led the Sequencing Quality Control (SEQC) project to conduct a comprehensive investigation of 278 representative RNA-seq data analysis pipelines consisting of 13 sequence mapping, three quantification, and seven normalization methods. In this article, we focused on the impact of the joint effects of RNA-seq pipelines on gene expression estimation as well as the downstream prediction of disease outcomes. First, we developed and applied three metrics (i.e., accuracy, precision, and reliability) to quantitatively evaluate each pipeline’s performance on gene expression estimation. We then investigated the correlation between the proposed metrics and the downstream prediction performance using two real-world cancer datasets (i.e., SEQC neuroblastoma dataset and the NIH/NCI TCGA lung adenocarcinoma dataset). We found that RNA-seq pipeline components jointly and significantly impacted the accuracy of gene expression estimation, and its impact was extended to the downstream prediction of these cancer outcomes. Specifically, RNA-seq pipelines that produced more accurate, precise, and reliable gene expression estimation tended to perform better in the prediction of disease outcome. In the end, we provided scenarios as guidelines for users to use these three metrics to select sensible RNA-seq pipelines for the improved accuracy, precision, and reliability of gene expression estimation, which lead to the improved downstream gene expression-based prediction of disease outcome.

Download Full-text

SEMIPARAMETRIC CLUSTERING METHOD FOR MICROARRAY DATA ANALYSIS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972000800345x ◽

2008 ◽

Vol 06 (02) ◽

pp. 261-282 ◽

Cited By ~ 2

Author(s):

AO YUAN ◽

WENQING HE

Keyword(s):

Data Analysis ◽

Microarray Data ◽

Mixture Distribution ◽

Information Criterion ◽

Optimal Number ◽

Microarray Data Analysis ◽

Parametric Methods ◽

Clustering Methods ◽

Microarray Gene Expression ◽

Data Set

Clustering is a major tool for microarray gene expression data analysis. The existing clustering methods fall mainly into two categories: parametric and nonparametric. The parametric methods generally assume a mixture of parametric subdistributions. When the mixture distribution approximately fits the true data generating mechanism, the parametric methods perform well, but not so when there is nonnegligible deviation between them. On the other hand, the nonparametric methods, which usually do not make distributional assumptions, are robust but pay the price for efficiency loss. In an attempt to utilize the known mixture form to increase efficiency, and to free assumptions about the unknown subdistributions to enhance robustness, we propose a semiparametric method for clustering. The proposed approach possesses the form of parametric mixture, with no assumptions to the subdistributions. The subdistributions are estimated nonparametrically, with constraints just being imposed on the modes. An expectation-maximization (EM) algorithm along with a classification step is invoked to cluster the data, and a modified Bayesian information criterion (BIC) is employed to guide the determination of the optimal number of clusters. Simulation studies are conducted to assess the performance and the robustness of the proposed method. The results show that the proposed method yields reasonable partition of the data. As an illustration, the proposed method is applied to a real microarray data set to cluster genes.

Download Full-text

Computational Strategies for Analyzing Data in Gene Expression Microarray Experiments

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720003000319 ◽

2003 ◽

Vol 01 (03) ◽

pp. 541-586 ◽

Cited By ~ 33

Author(s):

Tero Aittokallio ◽

Markus Kurki ◽

Olli Nevalainen ◽

Tuomas Nikula ◽

Anne West ◽

...

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Microarray Data ◽

Microarray Data Analysis ◽

Biological Research ◽

Microarray Experiments ◽

Dna Microarray Data ◽

Open Questions ◽

Analysis Technique ◽

Wide Range

Microarray analysis has become a widely used method for generating gene expression data on a genomic scale. Microarrays have been enthusiastically applied in many fields of biological research, even though several open questions remain about the analysis of such data. A wide range of approaches are available for computational analysis, but no general consensus exists as to standard for microarray data analysis protocol. Consequently, the choice of data analysis technique is a crucial element depending both on the data and on the goals of the experiment. Therefore, basic understanding of bioinformatics is required for optimal experimental design and meaningful interpretation of the results. This review summarizes some of the common themes in DNA microarray data analysis, including data normalization and detection of differential expression. Algorithms are demonstrated by analyzing cDNA microarray data from an experiment monitoring gene expression in T helper cells. Several computational biology strategies, along with their relative merits, are overviewed and potential areas for additional research discussed. The goal of the review is to provide a computational framework for applying and evaluating such bioinformatics strategies. Solid knowledge of microarray informatics contributes to the implementation of more efficient computational protocols for the given data obtained through microarray experiments.

Download Full-text

Microarray data analysis to identify differentially expressed genes and biological pathways associated with asthma

Experimental and Therapeutic Medicine ◽

10.3892/etm.2018.6366 ◽

2018 ◽

Author(s):

Shanshan Qi ◽

Guanghui Liu ◽

Xiang Dong ◽

Nan Huang ◽

Wenjing Li ◽

...

Keyword(s):

Data Analysis ◽

Differentially Expressed Genes ◽

Microarray Data ◽

Biological Pathways ◽

Differentially Expressed ◽

Microarray Data Analysis

Download Full-text

Gene Expression: Microarray Data Analysis

Bioinformatics and Functional Genomics ◽

10.1002/047145916x.ch7 ◽

2005 ◽

pp. 188-221

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Microarray Data ◽

Microarray Data Analysis ◽

Gene Expression Microarray ◽

Expression Microarray ◽

Gene Expression Microarray Data

Download Full-text

RealSpot: software validating results from DNA microarray data analysis with spot images

Physiological Genomics ◽

10.1152/physiolgenomics.00236.2004 ◽

2005 ◽

Vol 21 (2) ◽

pp. 284-291 ◽

Cited By ~ 15

Author(s):

Zhongming Chen ◽

Lin Liu

Keyword(s):

Data Analysis ◽

Dna Microarray ◽

Microarray Data ◽

Microarray Experiment ◽

Current Data ◽

Microarray Data Analysis ◽

Quality Analysis ◽

Biological Knowledge ◽

Data Validation ◽

Spot Quality

The spot images from DNA microarray highly affect the discovery of biological knowledge from gene expression data. However, results from quality analysis, normalization, differential expression, and cluster analysis are rarely validated with spot images in current data analysis methods or software packages. We designed RealSpot, a software package, to validate the results by directly associating spot quality and data with spot images in a spreadsheet table. RealSpot splits hybridization images into individual spots stored in a spreadsheet table. It subsequently associates microarray data with spot images and performs data validation through the standard table operation such as sorting, searching, and editing. RealSpot has several built-in functions to facilitate data validation, including spot quality analysis, data organization, one-way ANOVA, gene ontology association, verification, import, and export. We used RealSpot to evaluate 77 slides (30,000 features each) from real hybridization experiments and to validate results from each step of data analysis. It took ∼10 min to validate results of spot quality after initial evaluation and correct ∼0.3% of falsely assigned qualities of 10,000 spots. We validated 1,641 of 2,110 differentially expressed genes identified by SAM analysis in ∼1/2 h by comparing each gene with its respective spot image. Furthermore, we found that 6 of 48 genes in one cluster from k-mean clustering method showed inconsistent trends of spot images. RealSpot is efficient for validating microarray results and thus helpful for improving the reliability of the whole microarray experiment for experimentalists.

Download Full-text

Beginning Microarray Data Analysis: A Biologist's Guide to Analysis of DNA Microarray Data

Journal of Cell Science ◽

10.1242/jcs.00436 ◽

2003 ◽

Vol 116 (9) ◽

pp. 1649-1650 ◽

Cited By ~ 1

Author(s):

B. J. Fan

Keyword(s):

Data Analysis ◽

Dna Microarray ◽

Microarray Data ◽

Microarray Data Analysis ◽

Dna Microarray Data

Download Full-text

Distance Measures in DNA Microarray Data Analysis

Bioinformatics and Computational Biology Solutions Using R and Bioconductor - Statistics for Biology and Health ◽

10.1007/0-387-29362-0_12 ◽

2005 ◽

pp. 189-208 ◽

Cited By ~ 6

Author(s):

R. Gentleman ◽

B. Ding ◽

S. Dudoit ◽

J. Ibrahim

Keyword(s):

Data Analysis ◽

Dna Microarray ◽

Microarray Data ◽

Distance Measures ◽

Microarray Data Analysis ◽

Dna Microarray Data

Download Full-text

Fuzzy Ensemble Clustering for DNA Microarray Data Analysis

Applications of Fuzzy Sets Theory - Lecture Notes in Computer Science ◽

10.1007/978-3-540-73400-0_68 ◽

2007 ◽

pp. 537-543 ◽

Cited By ~ 4

Author(s):

Roberto Avogadri ◽

Giorgio Valentini

Keyword(s):

Data Analysis ◽

Dna Microarray ◽

Microarray Data ◽

Microarray Data Analysis ◽

Ensemble Clustering ◽

Dna Microarray Data

Download Full-text