scholarly journals Covariance adjustment for batch effect in gene expression data

2014 ◽  
Vol 33 (15) ◽  
pp. 2681-2695 ◽  
Author(s):  
Jung Ae Lee ◽  
Kevin K. Dobbin ◽  
Jeongyoun Ahn
2019 ◽  
Vol 21 (5) ◽  
pp. 1818-1824 ◽  
Author(s):  
Qi Zhao ◽  
Yu Sun ◽  
Zekun Liu ◽  
Hongwan Zhang ◽  
Xingyang Li ◽  
...  

Abstract   Unsupervised clustering of high-throughput gene expression data is widely adopted for cancer subtyping. However, cancer subtypes derived from a single dataset are usually not applicable across multiple datasets from different platforms. Merging different datasets is necessary to determine accurate and applicable cancer subtypes but is still embarrassing due to the batch effect. CrossICC is an R package designed for the unsupervised clustering of gene expression data from multiple datasets/platforms without the requirement of batch effect adjustment. CrossICC utilizes an iterative strategy to derive the optimal gene signature and cluster numbers from a consensus similarity matrix generated by consensus clustering. This package also provides abundant functions to visualize the identified subtypes and evaluate subtyping performance. We expected that CrossICC could be used to discover the robust cancer subtypes with significant translational implications in personalized care for cancer patients. Availability and Implementation The package is implemented in R and available at GitHub (https://github.com/bioinformatist/CrossICC) and Bioconductor (http://bioconductor.org/packages/release/bioc/html/CrossICC.html) under the GPL v3 License.


2012 ◽  
Vol 14 (4) ◽  
pp. 469-490 ◽  
Author(s):  
C. Lazar ◽  
S. Meganck ◽  
J. Taminau ◽  
D. Steenhoff ◽  
A. Coletta ◽  
...  

2020 ◽  
Author(s):  
Shaoheng Liang ◽  
Jinzhuang Dou ◽  
Ramiz Iqbal ◽  
Ken Chen

AbstractClustering and visualization are essential parts of single-cell gene expression data analysis. The Euclidean distance used in most distance-based methods is not optimal. Batch effect, i.e., the variability among samples gathered from different times, tissues, and patients, introduces large between-group distance and obscures the true identities of cells. To solve this problem, we introduce Batch-Corrected Distance (BCD), a metric using temporal/spatial locality of the batch effect to control for such factors. We validate BCD on a simulated data as well as applied it to a mouse retina development dataset and a lung dataset. We also found the utility of our approach in understanding the progression of the Coronavirus Disease 2019 (COVID-19). BCD achieves more accurate clusters and better visualizations than state-of-the-art batch correction methods on longitudinal datasets. BCD can be directly integrated with most clustering and visualization methods to enable more scientific findings.


2020 ◽  
Vol 15 ◽  
Author(s):  
Momo Matsuda ◽  
Xiucai Ye ◽  
Tetsuya Sakurai

Background: Batch effects are usually introduced in gene expression data, which can dramatically reduce the accuracy of statistical inference in the genomic analysis since samples in different batches cannot be directly comparable. Objective: To accurately measure biological variability and obtain correct statistical inference, we considered to correct / remove the batch effects for merging the samples from different batches into a comparable dataset for high-throughput genomic analysis. Methods: The existing L/S model uses the empirical Bayes methods to find the constant values for multiplication/addition for each gene. Different from the L/S model, we used the dimensionality reduction method. We proposed an effective scaling method to scale each gene by multiplying a constant value, which was formulated as an optimization problem based on spectral clustering. The data samples from different batches can be merged into a comparable dataset with batch effect correction. Furthermore, we proposed an approximation solution to solve the optimization problem for the scaling adjustment values. Results: We evaluated the proposed method on both artificial and gene expression datasets by comparing it with the existing well-established batch effect correction methods. Numerical experiments show that the proposed method projects the data samples from different batches to resemble each other and outperforms the others on both microarray and singlecell RNA-seq datasets. Conclusion: The scaling adjustment for genes and dimensionality reduction improved the accuracy and removed the batch effects, thereby making the proposed method more robust for interfering genes.


Sign in / Sign up

Export Citation Format

Share Document