ParKerC: Toolbox for Parallel Kernel Clustering Methods

Weighted Mutual Information for Aggregated Kernel Clustering

Entropy ◽

10.3390/e22030351 ◽

2020 ◽

Vol 22 (3) ◽

pp. 351

Author(s):

Nezamoddin N. Kachouie ◽

Meshal Shutaywi

Keyword(s):

Mutual Information ◽

Kernel Function ◽

Dimensional Space ◽

Data Sets ◽

Clustering Methods ◽

Main Challenge ◽

Kernel Clustering ◽

Clustering Data ◽

Project Data ◽

The Right

Background: A common task in machine learning is clustering data into different groups based on similarities. Clustering methods can be divided in two groups: linear and nonlinear. A commonly used linear clustering method is K-means. Its extension, kernel K-means, is a non-linear technique that utilizes a kernel function to project the data to a higher dimensional space. The projected data will then be clustered in different groups. Different kernels do not perform similarly when they are applied to different datasets. Methods: A kernel function might be relevant for one application but perform poorly to project data for another application. In turn choosing the right kernel for an arbitrary dataset is a challenging task. To address this challenge, a potential approach is aggregating the clustering results to obtain an impartial clustering result regardless of the selected kernel function. To this end, the main challenge is how to aggregate the clustering results. A potential solution is to combine the clustering results using a weight function. In this work, we introduce Weighted Mutual Information (WMI) for calculating the weights for different clustering methods based on their performance to combine the results. The performance of each method is evaluated using a training set with known labels. Results: We applied the proposed Weighted Mutual Information to four data sets that cannot be linearly separated. We also tested the method in different noise conditions. Conclusions: Our results show that the proposed Weighted Mutual Information method is impartial, does not rely on a single kernel, and performs better than each individual kernel specially in high noise.

Download Full-text

Partitioning hard kernel clustering methods based on local adaptive distances

2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC) ◽

10.1109/icsmc.2012.6377724 ◽

2012 ◽

Author(s):

Marcelo R.P. Ferreira ◽

Francisco de A.T. de Carvalho

Keyword(s):

Clustering Methods ◽

Kernel Clustering ◽

Hard Kernel

Download Full-text

PAMOGK: a pathway graph kernel-based multiomics approach for patient clustering

Bioinformatics ◽

10.1093/bioinformatics/btaa655 ◽

2020 ◽

Author(s):

Yasin Ilkagan Tepeli ◽

Ali Burak Ünal ◽

Furkan Mustafa Akdemir ◽

Oznur Tastan

Keyword(s):

Tumor Stage ◽

Supplementary Information ◽

Biological Knowledge ◽

P Value ◽

Multiple Views ◽

Clustering Methods ◽

Survival Times ◽

Molecular Alteration ◽

Graph Kernel ◽

Kernel Clustering

Abstract Motivation Accurate classification of patients into molecular subgroups is critical for the development of effective therapeutics and for deciphering what drives these subgroups to cancer. The availability of multiomics data catalogs for large cohorts of cancer patients provides multiple views into the molecular biology of the tumors with unprecedented resolution. Results We develop Pathway-based MultiOmic Graph Kernel clustering (PAMOGK) that integrates multiomics patient data with existing biological knowledge on pathways. We develop a novel graph kernel that evaluates patient similarities based on a single molecular alteration type in the context of a pathway. To corroborate multiple views of patients evaluated by hundreds of pathways and molecular alteration combinations, we use multiview kernel clustering. Applying PAMOGK to kidney renal clear cell carcinoma (KIRC) patients results in four clusters with significantly different survival times (P-value =1.24e−11). When we compare PAMOGK to eight other state-of-the-art multiomics clustering methods, PAMOGK consistently outperforms these in terms of its ability to partition KIRC patients into groups with different survival distributions. The discovered patient subgroups also differ with respect to other clinical parameters such as tumor stage and grade, and primary tumor and metastasis tumor spreads. The pathways identified as important are highly relevant to KIRC. Availability and implementation github.com/tastanlab/pamogk. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PAMOGK: A Pathway Graph Kernel based Multi-Omics Clustering Approach for Discovering Cancer Patient Subgroups

10.1101/834168 ◽

2019 ◽

Author(s):

Yasin Ilkagan Tepeli ◽

Ali Burak Ünal ◽

Furkan Mustafa Akdemir ◽

Oznur Tastan

Keyword(s):

Tumor Stage ◽

Renal Clear Cell Carcinoma ◽

Biological Knowledge ◽

Multiple Views ◽

Clustering Methods ◽

Survival Times ◽

Molecular Alteration ◽

Graph Kernel ◽

Kernel Clustering ◽

Patient Subgroups

AbstractAccurate classification of patients into molecular subgroups is critical for the development of effective therapeutics and for deciphering what drives these subgroups to cancer. The availability of multi-omics data cat-alogs for large cohorts of cancer patients provides multiple views into the molecular biology of the tumors with unprecedented resolution. We develop PAMOGK (Pathway based Multi Omic Graph Kernel clustering) that not only integrates multi-omics patient data with existing biological knowledge on pathways. We develop a novel graph kernel that evaluates patient similarities based on a single molecular alteration type in the context of a pathway. To corroborate multiple views of patients evaluated by hundreds of pathways and molecular alteration combinations, we use multi-view kernel clustering. Applying PAMOGK to kidney renal clear cell carcinoma (KIRC) patients results in four clusters with significantly different survival times (p-value = 1.24e-11). When we compare PAMOGK to eight other state-of-the-art multi-omics clustering methods, PAMOGK consistently outperforms these in terms of its ability to partition KIRC patients into groups with different survival distributions. The discovered patient subgroups also differ with respect to other clinical parameters such as tumor stage and grade, and primary tumor and metastasis tumor spreads. The pathways identified as important are highly relevant to KIRC. PAMOGK is available at github.com/tastanlab/pamogk

Download Full-text

Clustering Methods for Italian Residential Real Estate Market

10.15396/eres2005_287 ◽

2005 ◽

Keyword(s):

Real Estate ◽

Real Estate Market ◽

Clustering Methods ◽

Residential Real Estate

Download Full-text

Survey of Clustering Methods for Large Scale Dataset

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i5.13381344 ◽

2019 ◽

Vol 7 (5) ◽

pp. 1338-1344

Author(s):

Anupama Jawale ◽

Ganesh Magar

Keyword(s):

Large Scale ◽

Clustering Methods ◽

Large Scale Dataset

Download Full-text

Hierarchical and Non-Hierarchical Linear and Non-Linear Clustering Methods to Shakespeare Authorship Questionn

SSRN Electronic Journal ◽

10.2139/ssrn.2989022 ◽

2015 ◽

Author(s):

Refat Aljumily

Keyword(s):

Clustering Methods ◽

Non Linear

Download Full-text

Molecular Topology and Other Promiscuity Determinants as Predictors of Therapeutic Class - A Theoretical Framework to Guide Drug Repositioning?

Current Topics in Medicinal Chemistry ◽

10.2174/1568026618666180801091642 ◽

2018 ◽

Vol 18 (13) ◽

pp. 1110-1122 ◽

Cited By ~ 2

Author(s):

Juan F. Morales ◽

Lucas N. Alberca ◽

Sara Chuguransky ◽

Mauricio E. Di Ianni ◽

Alan Talevi ◽

...

Keyword(s):

Molecular Descriptors ◽

Drug Repositioning ◽

Drug Repurposing ◽

Topological Descriptors ◽

Log P ◽

Acidity Constant ◽

Molecular Topology ◽

Clustering Methods ◽

Mean Values ◽

Qsar Models

Much interest has been paid in the last decade on molecular predictors of promiscuity, including molecular weight, log P, molecular complexity, acidity constant and molecular topology, with correlations between promiscuity and those descriptors seemingly being context-dependent. It has been observed that certain therapeutic categories (e.g. mood disorders therapies) display a tendency to include multi-target agents (i.e. selective non-selectivity). Numerous QSAR models based on topological descriptors suggest that the topology of a given drug could be used to infer its therapeutic applications. Here, we have used descriptive statistics to explore the distribution of molecular topology descriptors and other promiscuity predictors across different therapeutic categories. Working with the publicly available ChEMBL database and 14 molecular descriptors, both hierarchical and non-hierchical clustering methods were applied to the descriptors mean values of the therapeutic categories after the refinement of the database (770 drugs grouped into 34 therapeutic categories). On the other hand, another publicly available database (repoDB) was used to retrieve cases of clinically-approved drug repositioning examples that could be classified into the therapeutic categories considered by the aforementioned clusters (111 cases), and the correspondence between the two studies was evaluated. Interestingly, a 3- cluster hierarchical clustering scheme based on only 14 molecular descriptors linked to promiscuity seem to explain up to 82.9% of approved cases of drug repurposing retrieved of repoDB. Therapeutic categories seem to display distinctive molecular patterns, which could be used as a basis for drug screening and drug design campaigns, and to unveil drug repurposing opportunities between particular therapeutic categories.

Download Full-text

k-Means, Ward and Probabilistic Distance-Based Clustering Methods with Contiguity Constraint

Journal of Classification ◽

10.1007/s00357-020-09370-5 ◽

2020 ◽

Author(s):

Andrzej Młodak

Keyword(s):

Clustering Methods ◽

Probabilistic Distance

Download Full-text

Unsupervised Identification of Targeted Spectra Applying Rank1-NMF and FCC Algorithms in Long-Wave Hyperspectral Infrared Imagery

Remote Sensing ◽

10.3390/rs13112125 ◽

2021 ◽

Vol 13 (11) ◽

pp. 2125

Author(s):

Bardia Yousefi ◽

Clemente Ibarra-Castanedo ◽

Martin Chamberland ◽

Xavier P. V. Maldague ◽

Georges Beaudoin

Keyword(s):

Matched Filter ◽

Principal Component ◽

Hyperspectral Data ◽

Clustering Methods ◽

Spectral Angle Mapper ◽

Long Wave ◽

Clustering Approach ◽

Spectral Comparison ◽

Computational Simplicity ◽

Mineral Identification

Clustering methods unequivocally show considerable influence on many recent algorithms and play an important role in hyperspectral data analysis. Here, we challenge the clustering for mineral identification using two different strategies in hyperspectral long wave infrared (LWIR, 7.7–11.8 μm). For that, we compare two algorithms to perform the mineral identification in a unique dataset. The first algorithm uses spectral comparison techniques for all the pixel-spectra and creates RGB false color composites (FCC). Then, a color based clustering is used to group the regions (called FCC-clustering). The second algorithm clusters all the pixel-spectra to directly group the spectra. Then, the first rank of non-negative matrix factorization (NMF) extracts the representative of each cluster and compares results with the spectral library of JPL/NASA. These techniques give the comparison values as features which convert into RGB-FCC as the results (called clustering rank1-NMF). We applied K-means as clustering approach, which can be modified in any other similar clustering approach. The results of the clustering-rank1-NMF algorithm indicate significant computational efficiency (more than 20 times faster than the previous approach) and promising performance for mineral identification having up to 75.8% and 84.8% average accuracies for FCC-clustering and clustering-rank1 NMF algorithms (using spectral angle mapper (SAM)), respectively. Furthermore, several spectral comparison techniques are used also such as adaptive matched subspace detector (AMSD), orthogonal subspace projection (OSP) algorithm, principal component analysis (PCA), local matched filter (PLMF), SAM, and normalized cross correlation (NCC) for both algorithms and most of them show a similar range in accuracy. However, SAM and NCC are preferred due to their computational simplicity. Our algorithms strive to identify eleven different mineral grains (biotite, diopside, epidote, goethite, kyanite, scheelite, smithsonite, tourmaline, pyrope, olivine, and quartz).

Download Full-text