Estimation of relationships between chemical substructures and antibiotic resistance-related gene expression in bacteria: Adapting a canonical correlation analysis for small sample data of gathered features using consensus clustering

Tsuyoshi Esaki; Takaaki Horinouchi; Yayoi Natsume-Kitatani; Yosui Nojima; Iwao Sakane; Hidetoshi Matsui

doi:10.1273/cbij.20.58

Leveraging expression from multiple tissues using sparse canonical correlation analysis and aggregate tests improves the power of transcriptome-wide association studies

PLoS Genetics ◽

10.1371/journal.pgen.1008973 ◽

2021 ◽

Vol 17 (4) ◽

pp. e1008973

Author(s):

Helian Feng ◽

Nicholas Mancuso ◽

Alexander Gusev ◽

Arunabha Majumdar ◽

Megan Major ◽

...

Keyword(s):

Gene Expression ◽

Correlation Analysis ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Association Studies ◽

Expression Patterns ◽

Small Sample ◽

Type I ◽

Sparse Canonical Correlation Analysis ◽

Eqtl Data

Transcriptome-wide association studies (TWAS) test the association between traits and genetically predicted gene expression levels. The power of a TWAS depends in part on the strength of the correlation between a genetic predictor of gene expression and the causally relevant gene expression values. Consequently, TWAS power can be low when expression quantitative trait locus (eQTL) data used to train the genetic predictors have small sample sizes, or when data from causally relevant tissues are not available. Here, we propose to address these issues by integrating multiple tissues in the TWAS using sparse canonical correlation analysis (sCCA). We show that sCCA-TWAS combined with single-tissue TWAS using an aggregate Cauchy association test (ACAT) outperforms traditional single-tissue TWAS. In empirically motivated simulations, the sCCA+ACAT approach yielded the highest power to detect a gene associated with phenotype, even when expression in the causal tissue was not directly measured, while controlling the Type I error when there is no association between gene expression and phenotype. For example, when gene expression explains 2% of the variability in outcome, and the GWAS sample size is 20,000, the average power difference between the ACAT combined test of sCCA features and single-tissue, versus single-tissue combined with Generalized Berk-Jones (GBJ) method, single-tissue combined with S-MultiXcan, UTMOST, or summarizing cross-tissue expression patterns using Principal Component Analysis (PCA) approaches was 5%, 8%, 5% and 38%, respectively. The gain in power is likely due to sCCA cross-tissue features being more likely to be detectably heritable. When applied to publicly available summary statistics from 10 complex traits, the sCCA+ACAT test was able to increase the number of testable genes and identify on average an additional 400 additional gene-trait associations that single-trait TWAS missed. Our results suggest that aggregating eQTL data across multiple tissues using sCCA can improve the sensitivity of TWAS while controlling for the false positive rate.

Download Full-text

Penalized canonical correlation analysis to quantify the association between gene expression and DNA markers

BMC Proceedings ◽

10.1186/1753-6561-1-s1-s122 ◽

2007 ◽

Vol 1 (S1) ◽

Cited By ~ 21

Author(s):

Sandra Waaijenborg ◽

Aeilko H Zwinderman

Keyword(s):

Gene Expression ◽

Correlation Analysis ◽

Canonical Correlation Analysis ◽

Dna Markers ◽

Canonical Correlation

Download Full-text

A New Robust Deep Canonical Correlation Analysis Algorithm for Small Sample Problems

IEEE Access ◽

10.1109/access.2019.2895363 ◽

2019 ◽

Vol 7 ◽

pp. 33631-33639

Author(s):

Yan Liu ◽

Yun Li ◽

Yun-Hao Yuan ◽

Hui Zhang

Keyword(s):

Correlation Analysis ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Small Sample ◽

Analysis Algorithm

Download Full-text

Cellular Relationships of Testicular Germ Cell Tumors Determined by Partial Canonical Correlation Analysis of Gene Expression Signatures

Current Bioinformatics ◽

10.2174/1574893611308010012 ◽

2013 ◽

Vol 8 (1) ◽

pp. 72-79

Author(s):

Tingting Yu ◽

Akiko Toshimori ◽

Xia Jinrong ◽

Shigeru Saito ◽

Xingrong Zhou ◽

...

Keyword(s):

Gene Expression ◽

Correlation Analysis ◽

Germ Cell ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Germ Cell Tumors ◽

Testicular Germ Cell Tumors ◽

Testicular Germ Cell ◽

Gene Expression Signatures

Download Full-text

Integrative analysis of gene expression and copy number alterations using canonical correlation analysis

BMC Bioinformatics ◽

10.1186/1471-2105-11-191 ◽

2010 ◽

Vol 11 (1) ◽

Cited By ~ 42

Author(s):

Charlotte Soneson ◽

Henrik Lilljebjörn ◽

Thoas Fioretos ◽

Magnus Fontes

Keyword(s):

Gene Expression ◽

Correlation Analysis ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Copy Number ◽

Integrative Analysis ◽

Copy Number Alterations

Download Full-text

Leveraging expression from multiple tissues using sparse canonical correlation analysis and aggregate tests improve the power of transcriptome-wide association studies

10.1101/2020.07.03.186247 ◽

2020 ◽

Author(s):

Helian Feng ◽

Nicholas Mancuso ◽

Alexander Gusev ◽

Arunabha Majumdar ◽

Megan Major ◽

...

Keyword(s):

Gene Expression ◽

Correlation Analysis ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Complex Traits ◽

Association Studies ◽

Tissue Expression ◽

Expression Levels ◽

Sparse Canonical Correlation Analysis ◽

Eqtl Data

AbstractTranscriptome-wide association studies (TWAS) test the association between traits and genetically predicted gene expression levels. The power of a TWAS depends in part on the strength of the correlation between a genetic predictor of gene expression and the causally relevant gene expression values. Consequently, TWAS power can be low when expression quantitative trait locus (eQTL) data used to train the genetic predictors have small sample sizes, or when data from causally relevant tissues are not available. Here, we propose to address these issues by integrating multiple tissues in the TWAS using sparse canonical correlation analysis (sCCA). We show that sCCA-TWAS combined with single-tissue TWAS using an aggregate Cauchy association test (ACAT) outperforms traditional single-tissue TWAS. In empirically motivated simulations, the sCCA+ACAT approach yielded the highest power to detect a gene associated with phenotype, even when expression in the causal tissue was not directly measured, while controlling the Type I error when there is no association between gene expression and phenotype. For example, when gene expression explains 2% of the variability in outcome, and the GWAS sample size is 20,000, the average power difference between the ACAT combined test of sCCA features and single-tissue, versus single-tissue combined with Generalized Berk-Jones (GBJ) method, single-tissue combined with S-MultiXcan or summarizing cross-tissue expression patterns using Principal Component Analysis (PCA) approaches was 5%, 8%, and 38%, respectively. The gain in power is likely due to sCCA cross-tissue features being more likely to be detectably heritable. When applied to publicly available summary statistics from 10 complex traits, the sCCA+ACAT test was able to increase the number of testable genes and identify on average an additional 400 additional gene-trait associations that single-trait TWAS missed. Our results suggest that aggregating eQTL data across multiple tissues using sCCA can improve the sensitivity of TWAS while controlling for the false positive rate.Author summaryTranscriptome-wide association studies (TWAS) can improve the statistical power of genetic association studies by leveraging the relationship between genetically predicted transcript expression levels and an outcome. We propose a new TWAS pipeline that integrates data on the genetic regulation of expression levels across multiple tissues. We generate cross-tissue expression features using sparse canonical correlation analysis and then combine evidence for expression-outcome association across cross- and single-tissue features using the aggregate Cauchy association test. We show that this approach has substantially higher power than traditional single-tissue TWAS methods. Application of these methods to publicly available summary statistics for ten complex traits also identifies associations missed by single-tissue methods.

Download Full-text

Canonical correlation analysis of high-dimensional data with very small sample support

Signal Processing ◽

10.1016/j.sigpro.2016.05.020 ◽

2016 ◽

Vol 128 ◽

pp. 449-458 ◽

Cited By ~ 32

Author(s):

Yang Song ◽

Peter J. Schreier ◽

David Ramírez ◽

Tanuj Hasija

Keyword(s):

Correlation Analysis ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

High Dimensional Data ◽

Small Sample ◽

High Dimensional ◽

Sample Support

Download Full-text

Expression reflects population structure

10.1101/364448 ◽

2018 ◽

Author(s):

Brielin C Brown ◽

Nicolas L. Bray ◽

Lior Pachter

Keyword(s):

Gene Expression ◽

Population Structure ◽

African Americans ◽

Correlation Analysis ◽

Canonical Correlation Analysis ◽

Gene Expression Data ◽

Principal Components ◽

Canonical Correlation ◽

Principal Component ◽

Expression Data

AbstractPopulation structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a naïve principal components analysis of the gene expression matrix does not cluster by population. We identify a linear projection that reveals population structure in gene expression data. Our approach relies on the coupling of the principal components of genotype to the principal components of gene expression via canonical correlation analysis. Futhermore, we analyze the variance of each gene within the projection matrix to determine which genes significantly influence the projection. We identify thousands of significant genes, and show that a number of the top genes have been implicated in diseases that disproportionately impact African Americans.Author SummaryHigh dimensional, multi-modal genomics datasets are becoming increasingly common, which warrants investigation into analysis techniques that can reveal structure in the data without over-fitting. Here, we show that the coupling of principal component analysis to canonical correlation analysis offers an efficient approach to exploratory analysis of this kind of data. We apply this method to the GEUVADIS dataset of genotype and gene expression values of European and Yoruban individuals, finding as-of-yet unstudied population structure in the gene expression values. Moreover, many of the top genes identified by our method have been previously implicated in diseases that disproportionately impact African Americans.

Download Full-text

Underwater Acoustic Target Feature Fusion Method Based on Multi-Kernel Sparsity Preserve Multi-Set Canonical Correlation Analysis

Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University ◽

10.1051/jnwpu/20193710087 ◽

2019 ◽

Vol 37 (1) ◽

pp. 87-92

Author(s):

Honghui Yang ◽

Shuzhen Yi

Keyword(s):

Correlation Analysis ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Feature Fusion ◽

Target Recognition ◽

Small Sample Size ◽

Small Sample ◽

Projection Algorithm ◽

Fusion Method ◽

Underwater Target

To solve high-dimensional and small-sample-size classification problem for underwater target recognition, a new feature fusion method is proposed based on multi-kernel sparsity preserve multi-set canonical correlation analysis. The multi-set canonical correlation analysis algorithm is used to quantitatively analyze the correlation of multi-domain features, remove redundant and noise features, in order to achieve multi-domain feature fusion. The multi-kernel sparsely preserved projection algorithm is used to constrain the sparse reconstruction of the extracted multi-domain feature samples, which enhances the feature's classification ability. Results of applying real radiated noise datasets to underwater target recognition experiments show that our new method can effectively remove the redundancy and noise features, achieve the fusion of multi-domain underwater target features, and improve the recognition accuracy of underwater targets.

Download Full-text

Genome-Wide Canonical Correlation Analysis-Based Computational Methods for Mining Information from Microbiome and Gene Expression Data

Advances in Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-030-18305-9_53 ◽

2019 ◽

pp. 511-517 ◽

Cited By ~ 1

Author(s):

Rayhan Shikder ◽

Pourang Irani ◽

Pingzhao Hu

Keyword(s):

Gene Expression ◽

Correlation Analysis ◽

Computational Methods ◽

Canonical Correlation Analysis ◽

Gene Expression Data ◽

Canonical Correlation ◽

Expression Data ◽

Genome Wide

Download Full-text