Expression reflects population structure

Mapping Intimacies ◽

10.1101/364448 ◽

2018 ◽

Author(s):

Brielin C Brown ◽

Nicolas L. Bray ◽

Lior Pachter

Keyword(s):

Gene Expression ◽

Population Structure ◽

African Americans ◽

Correlation Analysis ◽

Canonical Correlation Analysis ◽

Gene Expression Data ◽

Principal Components ◽

Canonical Correlation ◽

Principal Component ◽

Expression Data

AbstractPopulation structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a naïve principal components analysis of the gene expression matrix does not cluster by population. We identify a linear projection that reveals population structure in gene expression data. Our approach relies on the coupling of the principal components of genotype to the principal components of gene expression via canonical correlation analysis. Futhermore, we analyze the variance of each gene within the projection matrix to determine which genes significantly influence the projection. We identify thousands of significant genes, and show that a number of the top genes have been implicated in diseases that disproportionately impact African Americans.Author SummaryHigh dimensional, multi-modal genomics datasets are becoming increasingly common, which warrants investigation into analysis techniques that can reveal structure in the data without over-fitting. Here, we show that the coupling of principal component analysis to canonical correlation analysis offers an efficient approach to exploratory analysis of this kind of data. We apply this method to the GEUVADIS dataset of genotype and gene expression values of European and Yoruban individuals, finding as-of-yet unstudied population structure in the gene expression values. Moreover, many of the top genes identified by our method have been previously implicated in diseases that disproportionately impact African Americans.

Download Full-text

Genome-Wide Canonical Correlation Analysis-Based Computational Methods for Mining Information from Microbiome and Gene Expression Data

Advances in Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-030-18305-9_53 ◽

2019 ◽

pp. 511-517 ◽

Cited By ~ 1

Author(s):

Rayhan Shikder ◽

Pourang Irani ◽

Pingzhao Hu

Keyword(s):

Gene Expression ◽

Correlation Analysis ◽

Computational Methods ◽

Canonical Correlation Analysis ◽

Gene Expression Data ◽

Canonical Correlation ◽

Expression Data ◽

Genome Wide

Download Full-text

Sparse generalized eigenvalue problem with application to canonical correlation analysis for integrative analysis of methylation and gene expression data

Biometrics ◽

10.1111/biom.12886 ◽

2018 ◽

Vol 74 (4) ◽

pp. 1362-1371 ◽

Cited By ~ 3

Author(s):

Sandra E. Safo ◽

Jeongyoun Ahn ◽

Yongho Jeon ◽

Sungkyu Jung

Keyword(s):

Gene Expression ◽

Correlation Analysis ◽

Eigenvalue Problem ◽

Canonical Correlation Analysis ◽

Gene Expression Data ◽

Canonical Correlation ◽

Integrative Analysis ◽

Generalized Eigenvalue Problem ◽

Expression Data ◽

Generalized Eigenvalue

Download Full-text

CLUSTERING GENE EXPRESSION DATA WITH KERNEL PRINCIPAL COMPONENTS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720005001168 ◽

2005 ◽

Vol 03 (02) ◽

pp. 303-316 ◽

Cited By ~ 6

Author(s):

ZHENQIU LIU ◽

DECHANG CHEN ◽

HALIMA BENSMAIL ◽

YING XU

Keyword(s):

Gene Expression ◽

Principal Component Analysis ◽

Gene Expression Data ◽

Microarray Data ◽

Principal Components ◽

Data Clustering ◽

Principal Component ◽

Kernel Principal Component Analysis ◽

Expression Data ◽

Fuzzy C Means

Kernel principal component analysis (KPCA) has been applied to data clustering and graphic cut in the last couple of years. This paper discusses the application of KPCA to microarray data clustering. A new algorithm based on KPCA and fuzzy C-means is proposed. Experiments with microarray data show that the proposed algorithms is in general superior to traditional algorithms.

Download Full-text

Relationship between snow cover and atmospheric circulation, central North America, winter 1988

Annals of Glaciology ◽

10.3189/s0260305500014269 ◽

1997 ◽

Vol 25 ◽

pp. 347-352 ◽

Cited By ~ 6

Author(s):

Chris Derksen ◽

Kkevin Misurak ◽

Ellsworth Ledrew ◽

Joe Piwowar ◽

Barry Goodison

Keyword(s):

Correlation Analysis ◽

Snow Cover ◽

Atmospheric Circulation ◽

Canonical Correlation Analysis ◽

Principal Components ◽

Canonical Correlation ◽

Snow Water Equivalent ◽

Principal Component ◽

Original Data ◽

Snow Cover Extent

The stochastic relationships between terrestrial snow water equivalent (SWE) and measures of the atmospheric circulation were investigated for the Canadian Prairies and the American Great Plains for the winter of 1988. Snow-cover extent, derived from EASE-grid SSM/I satellite data, and griddcd atmospheric data from the National Meteorological Center were averaged at five day intervals. Principal components analysis (PCA) were performed for the time series of SSM/I snow-cover imagery as well as for 700 mb geopotential height and temperature, 500 mb height and 700–500 mb thickness. Canonical correlation analysis of the derived principal component weights was used to identify relationships between atmospheric variables and SWE. Results of the PCA indicate that a high degree of variance in upper air variables (>75%) can be explained by the first three principal components, while the first three SWE components account for over 90% of the variance in the original data. Results of the canonical correlation analysis show positive relationships between snow-cover accumulation and a meridional pressure distribution pattern, while snow ablation is linked to a zonal atmospheric pressure pattern.

Download Full-text

Relationship between snow cover and atmospheric circulation, central North America, winter 1988

Annals of Glaciology ◽

10.1017/s0260305500014269 ◽

1997 ◽

Vol 25 ◽

pp. 347-352 ◽

Cited By ~ 3

Author(s):

Chris Derksen ◽

Kkevin Misurak ◽

Ellsworth Ledrew ◽

Joe Piwowar ◽

Barry Goodison

Keyword(s):

Correlation Analysis ◽

Snow Cover ◽

Atmospheric Circulation ◽

Canonical Correlation Analysis ◽

Principal Components ◽

Canonical Correlation ◽

Snow Water Equivalent ◽

Principal Component ◽

Original Data ◽

Snow Cover Extent

Download Full-text

Gene Set Correlation Analysis and Visualization Using Gene Expression Data

Current Bioinformatics ◽

10.2174/1574893615999200629124444 ◽

2020 ◽

Vol 15 ◽

Author(s):

Chen-An Tsai ◽

James J. Chen

Keyword(s):

Gene Expression ◽

Correlation Analysis ◽

Gene Expression Data ◽

Differentially Expressed Gene ◽

Differentially Expressed ◽

Superior Performance ◽

Expression Data ◽

Gene Set ◽

Gene Sets ◽

Set Correlation

Background: Gene set enrichment analyses (GSEA) provide a useful and powerful approach to identify differentially expressed gene sets with prior biological knowledge. Several GSEA algorithms have been proposed to perform enrichment analyses on groups of genes. However, many of these algorithms have focused on identification of differentially expressed gene sets in a given phenotype. Objective: In this paper, we propose a gene set analytic framework, Gene Set Correlation Analysis (GSCoA), that simultaneously measures within and between gene sets variation to identify sets of genes enriched for differential expression and highly co-related pathways. Methods: We apply co-inertia analysis to the comparisons of cross-gene sets in gene expression data to measure the costructure of expression profiles in pairs of gene sets. Co-inertia analysis (CIA) is one multivariate method to identify trends or co-relationships in multiple datasets, which contain the same samples. The objective of CIA is to seek ordinations (dimension reduction diagrams) of two gene sets such that the square covariance between the projections of the gene sets on successive axes is maximized. Simulation studies illustrate that CIA offers superior performance in identifying corelationships between gene sets in all simulation settings when compared to correlation-based gene set methods. Result and Conclusion: We also combine between-gene set CIA and GSEA to discover the relationships between gene sets significantly associated with phenotypes. In addition, we provide a graphical technique for visualizing and simultaneously exploring the associations of between and within gene sets and their interaction and network. We then demonstrate integration of within and between gene sets variation using CIA and GSEA, applied to the p53 gene expression data using the c2 curated gene sets. Ultimately, the GSCoA approach provides an attractive tool for identification and visualization of novel associations between pairs of gene sets by integrating co-relationships between gene sets into gene set analysis.

Download Full-text

Improving the Performance of Principal Components for Classification of Gene Expression Data Through Feature Selection

Studies in Classification, Data Analysis, and Knowledge Organization - Data Science and Classification ◽

10.1007/3-540-34416-0_35 ◽

2006 ◽

pp. 325-332

Author(s):

Edgar Acuña ◽

Jaime Porras

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Principal Components ◽

Expression Data

Download Full-text

Cox Survival Analysis of Microarray Gene Expression Data Using Correlation Principal Component Regression

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1153 ◽

2007 ◽

Vol 6 (1) ◽

Cited By ~ 4

Author(s):

Qiang Zhao ◽

Jianguo Sun

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Principal Component Regression ◽

Predictive Ability ◽

Principal Component ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Microarray Gene Expression ◽

New Approach ◽

Microarray Gene

Statistical analysis of microarray gene expression data has recently attracted a great deal of attention. One problem of interest is to relate genes to survival outcomes of patients with the purpose of building regression models for the prediction of future patients' survival based on their gene expression data. For this, several authors have discussed the use of the proportional hazards or Cox model after reducing the dimension of the gene expression data. This paper presents a new approach to conduct the Cox survival analysis of microarray gene expression data with the focus on models' predictive ability. The method modifies the correlation principal component regression (Sun, 1995) to handle the censoring problem of survival data. The results based on simulated data and a set of publicly available data on diffuse large B-cell lymphoma show that the proposed method works well in terms of models' robustness and predictive ability in comparison with some existing partial least squares approaches. Also, the new approach is simpler and easy to implement.

Download Full-text

Perturbation-Based Eigenvector Updates for On-Line Principal Components Analysis and Canonical Correlation Analysis

The Journal of VLSI Signal Processing Systems for Signal Image and Video Technology ◽

10.1007/s11265-006-9773-6 ◽

2006 ◽

Vol 45 (1-2) ◽

pp. 85-95 ◽

Cited By ~ 10

Author(s):

Anant Hegde ◽

Jose C. Principe ◽

Deniz Erdogmus ◽

Umut Ozertem ◽

Yadunandana N. Rao ◽

...

Keyword(s):

Correlation Analysis ◽

Principal Components Analysis ◽

Canonical Correlation Analysis ◽

Principal Components ◽

Canonical Correlation ◽

On Line ◽

Components Analysis

Download Full-text

Representation-Constrained Canonical Correlation Analysis: A Hybridization of Canonical Correlation and Principal Component Analyses

SSRN Electronic Journal ◽

10.2139/ssrn.1331886 ◽

2009 ◽

Cited By ~ 1

Author(s):

S. K. Mishra

Keyword(s):

Correlation Analysis ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Principal Component ◽

Principal Component Analyses

Download Full-text