quantile normalization
Recently Published Documents


TOTAL DOCUMENTS

33
(FIVE YEARS 1)

H-INDEX

12
(FIVE YEARS 0)

2021 ◽  
Vol 17 (2) ◽  
pp. e1008608
Author(s):  
Shay Ben-Elazar ◽  
Miriam Ragle Aure ◽  
Kristin Jonsdottir ◽  
Suvi-Katri Leivonen ◽  
Vessela N. Kristensen ◽  
...  

Different miRNA profiling protocols and technologies introduce differences in the resulting quantitative expression profiles. These include differences in the presence (and measurability) of certain miRNAs. We present and examine a method based on quantile normalization, Adjusted Quantile Normalization (AQuN), to combine miRNA expression data from multiple studies in breast cancer into a single joint dataset for integrative analysis. By pooling multiple datasets, we obtain increased statistical power, surfacing patterns that do not emerge as statistically significant when separately analyzing these datasets. To merge several datasets, as we do here, one needs to overcome both technical and batch differences between these datasets. We compare several approaches for merging and jointly analyzing miRNA datasets. We investigate the statistical confidence for known results and highlight potential new findings that resulted from the joint analysis using AQuN. In particular, we detect several miRNAs to be differentially expressed in estrogen receptor (ER) positive versus ER negative samples. In addition, we identify new potential biomarkers and therapeutic targets for both clinical groups. As a specific example, using the AQuN-derived dataset we detect hsa-miR-193b-5p to have a statistically significant over-expression in the ER positive group, a phenomenon that was not previously reported. Furthermore, as demonstrated by functional assays in breast cancer cell lines, overexpression of hsa-miR-193b-5p in breast cancer cell lines resulted in decreased cell viability in addition to inducing apoptosis. Together, these observations suggest a novel functional role for this miRNA in breast cancer. Packages implementing AQuN are provided for Python and Matlab: https://github.com/YakhiniGroup/PyAQN.


PROTEOMICS ◽  
2020 ◽  
Vol 20 (24) ◽  
pp. 2070171
Author(s):  
Eva Brombacher ◽  
Ariane Schad ◽  
Clemens Kreutz

PROTEOMICS ◽  
2020 ◽  
Vol 20 (24) ◽  
pp. 2000068
Author(s):  
Eva Brombacher ◽  
Ariane Schad ◽  
Clemens Kreutz

2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Yaxing Zhao ◽  
Limsoon Wong ◽  
Wilson Wen Bin Goh

Abstract Quantile normalization is an important normalization technique commonly used in high-dimensional data analysis. However, it is susceptible to class-effect proportion effects (the proportion of class-correlated variables in a dataset) and batch effects (the presence of potentially confounding technical variation) when applied blindly on whole data sets, resulting in higher false-positive and false-negative rates. We evaluate five strategies for performing quantile normalization, and demonstrate that good performance in terms of batch-effect correction and statistical feature selection can be readily achieved by first splitting data by sample class-labels before performing quantile normalization independently on each split (“Class-specific”). Via simulations with both real and simulated batch effects, we demonstrate that the “Class-specific” strategy (and others relying on similar principles) readily outperform whole-data quantile normalization, and is robust-preserving useful signals even during the combined analysis of separately-normalized datasets. Quantile normalization is a commonly used procedure. But when carelessly applied on whole datasets without first considering class-effect proportion and batch effects, can result in poor performance. If quantile normalization must be used, then we recommend using the “Class-specific” strategy.


2020 ◽  
Author(s):  
Eva Brombacher ◽  
Ariane Schad ◽  
Clemens Kreutz

AbstractHigh-throughput biological data – such as mass spectrometry-based proteomics data – suffer from systematic non-biological variance, which is introduced by systematic errors such as batch effects. This hinders the estimation of ‘real’ biological signals and, thus, decreases the power of statistical tests and biases the identification of differentially expressed sample classes. To remove such unintended variation, while retaining the biological signal of interest, the analysis workflows for mass spectrometry-based quantification typically comprises normalization steps prior to the statistical analysis of the data. Several normalization methods, such as quantile normalization, have originally been developed for microarray data. However, unlike microarray data, proteomics data may contain features, in the form of protein intensities, that are consistently highly abundant across experimental conditions and, hence, are encountered in the tails of the protein intensity distribution. If such proteins are present, statistical inferences of the intensity profiles of the normalized features are impeded through the increased number of false positive findings due to the biased estimation of the variance of the data. Thus, we developed a, freely available, novel approach: ‘tail-robust quantile normalization’. It extends the traditional quantile normalization to preserve the biological signals of features in the tails of the distribution over experimental conditions and to account for sample-dependent missing values.


2020 ◽  
Author(s):  
Yi Wang ◽  
Stephanie C. Hicks ◽  
Kasper D. Hansen

AbstractEstimates of correlation between pairs of genes in co-expression analysis are commonly used to construct networks among genes using gene expression data. Here, we show that the distribution of such correlations depend on the expression level of the involved genes, which we refer to this as a mean-correlation relationship in RNA-seq data, both bulk and single-cell. This dependence introduces a bias in co-expression analysis whereby highly expressed genes are more likely to be highly correlated. Such a relationship is not observed in protein-protein interaction data, suggesting that it is not reflecting biology. Ignoring this bias can lead to missing potentially biologically relevant pairs of genes that are lowly expressed, such as transcription factors. To address this problem, we introduce spatial quantile normalization (SpQN), a method for normalizing local distributions in a correlation matrix. We show that spatial quantile normalization removes the mean-correlation relationship and corrects the expression bias in network reconstruction.


2019 ◽  
Author(s):  
F. William Townes ◽  
Rafael A. Irizarry

AbstractSingle-cell RNA-seq (scRNA-seq) profiles gene expression of individual cells. Unique molecular identifiers (UMIs) remove duplicates in read counts resulting from polymerase chain reaction, a major source of noise. For scRNA-seq data lacking UMIs, we propose quasi-UMIs: quantile normalization of read counts to a compound Poisson distribution empirically derived from UMI datasets. When applied to ground-truth datasets having both reads and UMIs, quasi-UMI normalization has higher accuracy than alternatives such as census counts. Using quasi-UMIs enables methods designed specifically for UMI data to be applied to non-UMI scRNA-seq datasets.


Sign in / Sign up

Export Citation Format

Share Document