Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles

Nowadays, only processing visual features is not enough for multimedia semantic retrieval due to the complexity of multimedia data, which usually involve a variety of modalities, e.g. graphics, text, speech, video, etc. It becomes crucial to fully utilize the correlation between each feature and the target concept, the feature correlation within modalities, and the feature correlation across modalities. In this paper, the authors propose a Feature Correlation Clustering-based Multi-Modality Fusion Framework (FCC-MMF) for multimedia semantic retrieval. Features from different modalities are combined into one feature set with the same representation via a normalization and discretization process. Within and across modalities, multiple correspondence analysis is utilized to obtain the correlation between feature-value pairs, which are then projected onto the two principal components. K-medoids algorithm, which is a widely used partitioned clustering algorithm, is selected to minimize the Euclidean distance within the resulted clusters and produce high intra-correlated feature-value pair clusters. Majority vote is applied to subsequently decide which cluster each feature belongs to. Once the feature clusters are formed, one classifier is built and trained for each cluster. The correlation and confidence of each classifier are considered while fusing the classification scores, and mean average precision is used to evaluate the final ranked classification scores. Finally, the proposed framework is applied on NUS-wide Lite data set to demonstrate the effectiveness in multimedia semantic retrieval.

Download Full-text

Use of microRNA (miR) expression profiling to identify distinct subclasses of triple-negative breast cancers (TNBC).

Journal of Clinical Oncology ◽

10.1200/jco.2012.30.15_suppl.1007 ◽

2012 ◽

Vol 30 (15_suppl) ◽

pp. 1007-1007 ◽

Cited By ~ 1

Author(s):

Charles L. Shapiro ◽

Luciano Cascione ◽

Pierluigi Gasparini ◽

Francesca Lovat ◽

Stefania Carasi ◽

...

Keyword(s):

Breast Cancer ◽

Overall Survival ◽

Expression Profiling ◽

Clustering Algorithm ◽

Medical Center ◽

Expression Profiles ◽

Normal Breast ◽

Rank Test ◽

Cancer Data

1007 Background: TNBC is divided into basal and non-basal subclasses. To further subclassify TNBC we performed microRNA (miR) expression profiles and linked them to patient overall survival. Methods: During 1996-2005, 365 consecutive TNBC (phenotypically estrogen, progesterone and HER2 negative by immunohistochemistry [IHC]) were identified from the NCCN Breast Cancer Data Base/Tumor Registry at OSU Medical Center. One hundred fifty-eight (43%) formalin-fixed paraffin embedded (FFPE) breast cancer and 40 normal breast tissue blocks were available and tissue cores were obtained for RNA. RNA was isolated using the Ambion recoverall total nucleic acid isolation kit and the expression of ~700 miRs was assessed for each sample using the nanoString nCounter method. A consensus-clustering algorithm (ConsensusClusterPlus, Bioconductor www.bioconductor.org) was used to identify subclasses of TNBC and Kaplan-Meier overall survival curves were compared using the log-rank test. Censoring occurred at the date of death from causes other than breast cancer or at time of the last known follow-up, whichever occurred first. The median follow-up was 67 mo. (range 4-171 mo.). Results: The median age was 52 yrs. (range 20-84 yrs.); 81% white and 9% African-American; stages I, II, and III were 31%, 54% and 15%, respectively; and most patients received adjuvant anthracycline-based regimens with (25%) or without taxanes (75%). The algorithm identified 5 distinct subclasses; 1 clustering with normal breast miR expression whereas the other 4 each had a unique pattern of deregulated miRs. The median overall survivals were significantly different across the 5 cancer subclasses (log-rank p=0.028) (Table). Conclusions: miR expression profiling identifies and discriminates 5 TNBC subclasses, which do not coincide with those identified as basal and non-basal by IHC. Molecular analyses are ongoing to associate the miR-based subclasses with specific clinical features or the expression of specific pathways. [Table: see text]

Download Full-text

Unique gene expression profiles of human macrophages and dendritic cells to phylogenetically distinct parasites

Blood ◽

10.1182/blood-2002-10-3232 ◽

2003 ◽

Vol 102 (2) ◽

pp. 672-681 ◽

Cited By ~ 235

Author(s):

Damien Chaussabel ◽

Roshanak Tolouei Semnani ◽

Mary Ann McDowell ◽

David Sacks ◽

Alan Sher ◽

...

Keyword(s):

Gene Expression ◽

Dendritic Cells ◽

Leishmania Donovani ◽

Clustering Algorithm ◽

Leishmania Major ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Infectious Agent ◽

Experimental Conditions ◽

Characteristic Functional

AbstractMonocyte-derived dendritic cells (DCs) and macrophages (Mϕs) generated in vitro from the same individual blood donors were exposed to 5 different pathogens, and gene expression profiles were assessed by microarray analysis. Responses to Mycobacterium tuberculosis and to phylogenetically distinct protozoan (Leishmania major, Leishmania donovani, Toxoplasma gondii) and helminth (Brugia malayi) parasites were examined, each of which produces chronic infections in humans yet vary considerably in the nature of the immune responses they trigger. In the absence of microbial stimulation, DCs and Mϕs constitutively expressed approximately 4000 genes, 96% of which were shared between the 2 cell types. In contrast, the genes altered transcriptionally in DCs and Mϕs following pathogen exposure were largely cell specific. Profiling of the gene expression data led to the identification of sets of tightly coregulated genes across all experimental conditions tested. A newly devised literature-based clustering algorithm enabled the identification of functionally and transcriptionally homogenous groups of genes. A comparison of the responses induced by the individual pathogens by means of this strategy revealed major differences in the functionally related gene profiles associated with each infectious agent. Although the intracellular pathogens induced responses clearly distinct from the extracellular B malayi, they each displayed a unique pattern of gene expression that would not necessarily be predicted on the basis of their phylogenetic relationship. The association of characteristic functional clusters with each infectious agent is consistent with the concept that antigen-presenting cells have prewired signaling patterns for use in the response to different pathogens.

Download Full-text

Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2010.02.001 ◽

2010 ◽

Vol 43 (4) ◽

pp. 560-568 ◽

Cited By ~ 13

Author(s):

Anindya Bhattacharya ◽

Rajat K. De

Keyword(s):

Similar Pattern ◽

Clustering Algorithm ◽

Average Correlation ◽

Correlation Clustering ◽

Pattern Of Variation

Download Full-text

Model-based clustering of multi-tissue gene expression data

Bioinformatics ◽

10.1093/bioinformatics/btz805 ◽

2019 ◽

Cited By ~ 1

Author(s):

Pau Erola ◽

Johan L M Björkegren ◽

Tom Michoel

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Clustering Algorithm ◽

Expression Profiles ◽

Individual Characteristics ◽

Supplementary Information ◽

Expression Data ◽

Model Based ◽

Tissue Gene Expression ◽

Lemon Tree

Abstract Motivation Recently, it has become feasible to generate large-scale, multi-tissue gene expression data, where expression profiles are obtained from multiple tissues or organs sampled from dozens to hundreds of individuals. When traditional clustering methods are applied to this type of data, important information is lost, because they either require all tissues to be analyzed independently, ignoring dependencies and similarities between tissues, or to merge tissues in a single, monolithic dataset, ignoring individual characteristics of tissues. Results We developed a Bayesian model-based multi-tissue clustering algorithm, revamp, which can incorporate prior information on physiological tissue similarity, and which results in a set of clusters, each consisting of a core set of genes conserved across tissues as well as differential sets of genes specific to one or more subsets of tissues. Using data from seven vascular and metabolic tissues from over 100 individuals in the STockholm Atherosclerosis Gene Expression (STAGE) study, we demonstrate that multi-tissue clusters inferred by revamp are more enriched for tissue-dependent protein-protein interactions compared to alternative approaches. We further demonstrate that revamp results in easily interpretable multi-tissue gene expression associations to key coronary artery disease processes and clinical phenotypes in the STAGE individuals. Availability and implementation Revamp is implemented in the Lemon-Tree software, available at https://github.com/eb00/lemon-tree Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Molecular subtype identification and prognosis stratification by a metabolism-related gene expression signature in colorectal cancer

Journal of Translational Medicine ◽

10.1186/s12967-021-02952-w ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Dagui Lin ◽

Wenhua Fan ◽

Rongxin Zhang ◽

Enen Zhao ◽

Pansong Li ◽

...

Keyword(s):

Colorectal Cancer ◽

Clustering Algorithm ◽

Cox Regression ◽

Molecular Subtype ◽

Expression Profiles ◽

Gene Expression Signature ◽

Survival Prediction ◽

Cox Regression Analysis ◽

Metabolic Genes ◽

Cluster A

Abstract Background Metabolic reprograming have been associated with cancer occurrence and progression within the tumor immune microenvironment. However, the prognostic potential of metabolism-related genes in colorectal cancer (CRC) has not been comprehensively studied. Here, we investigated metabolic transcript-related CRC subtypes and relevant immune landscapes, and developed a metabolic risk score (MRS) for survival prediction. Methods Metabolism-related genes were collected from the Molecular Signatures Database and metabolic subtypes were identified using an unsupervised clustering algorithm based on the expression profiles of survival-related metabolic genes in GSE39582. The ssGSEA and ESTIMATE methods were applied to estimate the immune infiltration among subtypes. The MRS model was developed using LASSO Cox regression in the GSE39582 dataset and independently validated in the TCGA CRC and GSE17537 datasets. Results We identified two metabolism-related subtypes (cluster-A and cluster-B) of CRC based on the expression profiles of 539 survival-related metabolic genes with distinct immune profiles and notably different prognoses. The cluster-B subtype had a shorter OS and RFS than the cluster-A subtype. Eighteen metabolism-related genes that were mostly involved in lipid metabolism pathways were used to build the MRS in GSE39582. Patients with higher MRS had worse prognosis than those with lower MRS (HR 3.45, P < 0.001). The prognostic role of MRS was validated in the TCGA CRC (HR 2.12, P = 0.00017) and GSE17537 datasets (HR 2.67, P = 0.039). Time-dependent receiver operating characteristic curve and stratified analyses revealed the robust predictive ability of the MRS in each dataset. Multivariate Cox regression analysis indicted that the MRS could predict OS independent of TNM stage and age. Conclusions Our study provides novel insight into metabolic heterogeneity and its relationship with immune landscape in CRC. The MRS was identified as a robust prognostic marker and may facilitate individualized therapy for CRC patients.

Download Full-text

Molecular signatures of tumor progression in pancreatic adenocarcinoma identified by energy metabolism characteristics

10.21203/rs.3.rs-478202/v1 ◽

2021 ◽

Author(s):

Xin Wang ◽

Cong Tan ◽

Weiwei Weng ◽

Shujuan Ni ◽

Meng Zhang ◽

...

Keyword(s):

Energy Metabolism ◽

Pancreatic Adenocarcinoma ◽

Clustering Algorithm ◽

Molecular Subtypes ◽

Expression Profiles ◽

Gene Signature ◽

Gene Set Enrichment Analysis ◽

Algorithm Analysis ◽

Receptor Interaction ◽

Prognostic Gene

Abstract Background: In this study, we aimed to describe a molecular evaluation of primary pancreatic adenocarcinoma (PAAD) based on comprehensive analysis of energy-metabolism-related gene (EMRG) expression profiles.Methods: Molecular subtypes were identified by non-negative matrix clustering algorithm clustering on 565 EMRGs. The overall survival (OS) predictive gene signature was developed, internally and externally validated based on three online PAAD datasets. Hub genes were identified in molecular subtypes by weighted gene correlation network analysis (WGCNA) co-expression algorithm analysis, and then enrolled for determination of prognostic genes. Univariate, LASSO and multivariate Cox regression analyses were performed to assess prognostic genes and construct the prognostic gene signature. Time-dependent receiver operating characteristic (ROC) curve, Kaplan-Meier curve and nomogram were used to assess the performance of the gene signature.Results: On the basis of EMRGs expression profile, we propose a molecular classification dividing PAAD into two subtypes: Cluster 1, which display more immune and stromal cell components in tumor microenvironment and higher tumor purity; and Cluster 2, which display worse OS. Moreover, by using a three-phase training, test and validation process, we construct a 4-gene signature that can constantly classify the prognostic risk of patients in all three datasets, and which present higher robustness and clinical usability compared with four previous reported prognostic gene signatures. In addition, a novel nomogram constructed by combining clinical features and the 4-gene signature showed confident clinical utility in PAAD. According to gene set enrichment analysis (GSEA), gene sets related to the high-risk group were participated in the neuroactive ligand receptor interaction pathway. Conclusions: In summary, the EMRG-based molecular subtypes and prognostic gene model provides a roadmap for patient stratification and trials of targeted therapies.

Download Full-text

Som-Based Class Discovery Exploring the ICA-Reduced Features of Microarray Expression Profiles

Comparative and Functional Genomics ◽

10.1002/cfg.444 ◽

2004 ◽

Vol 5 (8) ◽

pp. 596-616 ◽

Cited By ~ 2

Author(s):

Andrei Dragomir ◽

Seferina Mavroudi ◽

Anastasios Bezerianos

Keyword(s):

Clustering Algorithm ◽

Learning Algorithm ◽

Expression Profiles ◽

Relevant Information ◽

Statistical Dependence ◽

Analysis Tool ◽

Self Organizing Map ◽

Biologically Relevant ◽

Class Discovery ◽

The Cost

Gene expression datasets are large and complex, having many variables and unknown internal structure. We apply independent component analysis (ICA) to derive a less redundant representation of the expression data. The decomposition produces components with minimal statistical dependence and reveals biologically relevant information. Consequently, to the transformed data, we apply cluster analysis (an important and popular analysis tool for obtaining an initial understanding of the data, usually employed for class discovery). The proposed self-organizing map (SOM)-based clustering algorithm automatically determines the number of ‘natural’ subgroups of the data, being aided at this task by the available prior knowledge of the functional categories of genes. An entropy criterion allows each gene to be assigned to multiple classes, which is closer to the biological representation. These features, however, are not achieved at the cost of the simplicity of the algorithm, since the map grows on a simple grid structure and the learning algorithm remains equal to Kohonen’s one.

Download Full-text