scholarly journals PAMOGK: A Pathway Graph Kernel based Multi-Omics Clustering Approach for Discovering Cancer Patient Subgroups

2019 ◽  
Author(s):  
Yasin Ilkagan Tepeli ◽  
Ali Burak Ünal ◽  
Furkan Mustafa Akdemir ◽  
Oznur Tastan

AbstractAccurate classification of patients into molecular subgroups is critical for the development of effective therapeutics and for deciphering what drives these subgroups to cancer. The availability of multi-omics data cat-alogs for large cohorts of cancer patients provides multiple views into the molecular biology of the tumors with unprecedented resolution. We develop PAMOGK (Pathway based Multi Omic Graph Kernel clustering) that not only integrates multi-omics patient data with existing biological knowledge on pathways. We develop a novel graph kernel that evaluates patient similarities based on a single molecular alteration type in the context of a pathway. To corroborate multiple views of patients evaluated by hundreds of pathways and molecular alteration combinations, we use multi-view kernel clustering. Applying PAMOGK to kidney renal clear cell carcinoma (KIRC) patients results in four clusters with significantly different survival times (p-value = 1.24e-11). When we compare PAMOGK to eight other state-of-the-art multi-omics clustering methods, PAMOGK consistently outperforms these in terms of its ability to partition KIRC patients into groups with different survival distributions. The discovered patient subgroups also differ with respect to other clinical parameters such as tumor stage and grade, and primary tumor and metastasis tumor spreads. The pathways identified as important are highly relevant to KIRC. PAMOGK is available at github.com/tastanlab/pamogk

Author(s):  
Yasin Ilkagan Tepeli ◽  
Ali Burak Ünal ◽  
Furkan Mustafa Akdemir ◽  
Oznur Tastan

Abstract Motivation Accurate classification of patients into molecular subgroups is critical for the development of effective therapeutics and for deciphering what drives these subgroups to cancer. The availability of multiomics data catalogs for large cohorts of cancer patients provides multiple views into the molecular biology of the tumors with unprecedented resolution. Results We develop Pathway-based MultiOmic Graph Kernel clustering (PAMOGK) that integrates multiomics patient data with existing biological knowledge on pathways. We develop a novel graph kernel that evaluates patient similarities based on a single molecular alteration type in the context of a pathway. To corroborate multiple views of patients evaluated by hundreds of pathways and molecular alteration combinations, we use multiview kernel clustering. Applying PAMOGK to kidney renal clear cell carcinoma (KIRC) patients results in four clusters with significantly different survival times (P-value =1.24e−11). When we compare PAMOGK to eight other state-of-the-art multiomics clustering methods, PAMOGK consistently outperforms these in terms of its ability to partition KIRC patients into groups with different survival distributions. The discovered patient subgroups also differ with respect to other clinical parameters such as tumor stage and grade, and primary tumor and metastasis tumor spreads. The pathways identified as important are highly relevant to KIRC. Availability and implementation github.com/tastanlab/pamogk. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 8 ◽  
Author(s):  
Shuqin Wang ◽  
Yongyong Chen ◽  
Fangying Zheng

Multi-view clustering has been deeply explored since the compatible and complementary information among views can be well captured. Recently, the low-rank tensor representation-based methods have effectively improved the clustering performance by exploring high-order correlations between multiple views. However, most of them often express the low-rank structure of the self-representative tensor by the sum of unfolded matrix nuclear norms, which may cause the loss of information in the tensor structure. In addition, the amount of effective information in all views is not consistent, and it is unreasonable to treat their contribution to clustering equally. To address the above issues, we propose a novel weighted low-rank tensor representation (WLRTR) method for multi-view subspace clustering, which encodes the low-rank structure of the representation tensor through Tucker decomposition and weights the core tensor to retain the main information of the views. Under the augmented Lagrangian method framework, an iterative algorithm is designed to solve the WLRTR method. Numerical studies on four real databases have proved that WLRTR is superior to eight state-of-the-art clustering methods.


Entropy ◽  
2020 ◽  
Vol 22 (3) ◽  
pp. 351
Author(s):  
Nezamoddin N. Kachouie ◽  
Meshal Shutaywi

Background: A common task in machine learning is clustering data into different groups based on similarities. Clustering methods can be divided in two groups: linear and nonlinear. A commonly used linear clustering method is K-means. Its extension, kernel K-means, is a non-linear technique that utilizes a kernel function to project the data to a higher dimensional space. The projected data will then be clustered in different groups. Different kernels do not perform similarly when they are applied to different datasets. Methods: A kernel function might be relevant for one application but perform poorly to project data for another application. In turn choosing the right kernel for an arbitrary dataset is a challenging task. To address this challenge, a potential approach is aggregating the clustering results to obtain an impartial clustering result regardless of the selected kernel function. To this end, the main challenge is how to aggregate the clustering results. A potential solution is to combine the clustering results using a weight function. In this work, we introduce Weighted Mutual Information (WMI) for calculating the weights for different clustering methods based on their performance to combine the results. The performance of each method is evaluated using a training set with known labels. Results: We applied the proposed Weighted Mutual Information to four data sets that cannot be linearly separated. We also tested the method in different noise conditions. Conclusions: Our results show that the proposed Weighted Mutual Information method is impartial, does not rely on a single kernel, and performs better than each individual kernel specially in high noise.


Blood ◽  
2004 ◽  
Vol 104 (11) ◽  
pp. 4938-4938
Author(s):  
Axel Glasmacher ◽  
Corinna Hahn ◽  
Andrea Juttner ◽  
Regine Schubert ◽  
Barbara Busert ◽  
...  

Abstract Recent publications have shown that chromosomal abnormalities in patients with leukemias play an important role with respect to therapy and prognosis. In multiple myeloma (MM) the role of specific cytogenetic changes relevant for the prognosis is still to be defined. Recent data suggest that much more patients show chromosomal aberrations than previously suspected, but differentiation between main and side lines of karyotype evolution was problematic. In the present investigation, cytogenetic analysis was performed using interphase FISH in 130 patients with multiple myeloma. For hybridization, 9 repetitive (chromosomes 3, 7, 9, 11, 15, 17, 18, X, Y) and 7 single copy probes (2x5, 13, 17, 21, 2x22) were used. Aberrations were detected in 87% of the patients. Most cases showed 1–3 aberrations. There was a correlation between the number of aberrations per patient and the tumor stage. E.g. the percentage of patients with 7–12 aberrations increased from 16% in stage II to 28% in stage III. Gains and losses of chromosomes showed significant interchromosomal differences with gains being more frequent than losses. Chromosomes 3, 5, 7, 9, 21 and 22 showed predominantly gains. Losses were found in chromsomes 13, 17, X and Y. But monosomy of sex chromosomes (average age of 63.5 years) may be in part explained by the age of the patients. For chromosomes 15 and 18 a similar number of monosomies and trisomies was found which might be caused by mitotic nondisjunction. Deletions 13q14 (28%), gain of 11q13 and translocation of IgH locus 14q32 (79%) are specific aberrations detected in 39 patients analysed with specific DNA probes of the relevant loci. All three aberrations led to modified survival times of the patients. Summarizing our results in 130 patients with MM, we found that the number of numerical chromosomal aberrations as well as selected structural aberrations proved to be of diagnostic and prognostic relevance.


Blood ◽  
2006 ◽  
Vol 108 (11) ◽  
pp. 252-252 ◽  
Author(s):  
Detlef Haase ◽  
Ulrich Germing ◽  
Julie Schanz ◽  
Michael Pfeilstoecker ◽  
Barbara Hildebrandt ◽  
...  

Abstract In the IPSS the variables bone marrow blasts, cytogenetics and cytopenias were found to be most relevant for clinical outcome by multivariate analysis. Comparing cytogenetics and blasts scoring points were assigned as follows: 0 (good cytogenetics and less than 5% blasts), 0.5 points (intermediate cytogenetics, and 5–10% blasts), 1.0 points (poor cytogenetics), 1.5 points (11–20% blasts), 2.0 (21–30% blasts). In order to examine the correctness of weighting of cytogenetics in comparison to blast counts we compared the survival curves, median survival times (mst) and differences of mst (mst diff.) related to the mst of 37.5 months (mo) of our entire study population (on the basis of 2124 pts. with MDS from our German-Austrian database) between patient subgroups. The results in the subgroups were as follows: Blasts below 5% (n=609) (mst: 58 mo, mst diff.: +20.5 mo), good cytogenetics (n=768) (mst: 55.3 mo, mst diff.: +17.8 mo), blasts 5–10% (n=231) (mst: 28.0 mo, mst. diff.: −9.5 mo), intermediate cytogenetics (n=222) (mst: 28.0 mo, mst diff.: −9.5 mo), blasts: 11–20% (n=160) (mst: 16.5 mo, mst diff.: −21 mo), poor cytogenetics (n=212) (mst: 11.1 mo, mst diff.: −26.3 mo), blasts 21–30% (n=92) (mst: 11.7, mst diff.: −25.7 mo). Our results clearly show that within the IPSS poor cytogenetics are significantly underweighed. Referring to the survival data unfavorable cytogenetics resulted in a survival disadvantage at the same scale as compared to 21–30% blasts. Thus, in a revised IPSS this cytogenetic feature should get the same scoring points as compared to 21–30% blasts when using the FAB-classification. For a scoring system based on the WHO-classification unfavorable cytogenetics should get an even higher scoring value as compared to the maximum blast count of 19%. Further statistical analyses are on the way to substantiate our conclusions.


2020 ◽  
Vol 34 (04) ◽  
pp. 4860-4867
Author(s):  
Jing Liu ◽  
Fuyuan Cao ◽  
Xiao-Zhi Gao ◽  
Liqin Yu ◽  
Jiye Liang

Clustering by jointly exploiting information from multiple views can yield better performance than clustering on one single view. Some existing multi-view clustering methods aim at learning a weight for each view to determine its contribution to the final solution. However, the view-weighted scheme can only indicate the overall importance of a view, which fails to recognize the importance of each inner cluster of a view. A view with higher weight cannot guarantee all clusters in this view have higher importance than them in other views. In this paper, we propose a cluster-weighted kernel k-means method for multi-view clustering. Each inner cluster of each view is assigned a weight, which is learned based on the intra-cluster similarity of the cluster compared with all its corresponding clusters in different views, to make the cluster with higher intra-cluster similarity have a higher weight among the corresponding clusters. The cluster labels are learned simultaneously with the cluster weights in an alternative updating way, by minimizing the weighted sum-of-squared errors of the kernel k-means. Compared with the view-weighted scheme, the cluster-weighted scheme enhances the interpretability for the clustering results. Experimental results on both synthetic and real data sets demonstrate the effectiveness of the proposed method.


PLoS ONE ◽  
2021 ◽  
Vol 16 (1) ◽  
pp. e0245264
Author(s):  
Ali Sabah ◽  
Sabrina Tiun ◽  
Nor Samsiah Sani ◽  
Masri Ayob ◽  
Adil Yaseen Taha

Existing text clustering methods utilize only one representation at a time (single view), whereas multiple views can represent documents. The multiview multirepresentation method enhances clustering quality. Moreover, existing clustering methods that utilize more than one representation at a time (multiview) use representation with the same nature. Hence, using multiple views that represent data in a different representation with clustering methods is reasonable to create a diverse set of candidate clustering solutions. On this basis, an effective dynamic clustering method must consider combining multiple views of data including semantic view, lexical view (word weighting), and topic view as well as the number of clusters. The main goal of this study is to develop a new method that can improve the performance of web search result clustering (WSRC). An enhanced multiview multirepresentation consensus clustering ensemble (MMCC) method is proposed to create a set of diverse candidate solutions and select a high-quality overlapping cluster. The overlapping clusters are obtained from the candidate solutions created by different clustering methods. The framework to develop the proposed MMCC includes numerous stages: (1) acquiring the standard datasets (MORESQUE and Open Directory Project-239), which are used to validate search result clustering algorithms, (2) preprocessing the dataset, (3) applying multiview multirepresentation clustering models, (4) using the radius-based cluster number estimation algorithm, and (5) employing the consensus clustering ensemble method. Results show an improvement in clustering methods when multiview multirepresentation is used. More importantly, the proposed MMCC model improves the overall performance of WSRC compared with all single-view clustering models.


Sign in / Sign up

Export Citation Format

Share Document