LZW mutual-information-maximizing input clustering algorithm

Author(s):  
Orawan Watchanupaporn ◽  
Worasait Suwannik
2013 ◽  
Vol 19 (1) ◽  
pp. 212-215
Author(s):  
Chang-Woo Seo ◽  
Bo Kyung Cha ◽  
Ryun Kyung Kim ◽  
Sungchae Jeon ◽  
Young Huh ◽  
...  

2013 ◽  
Vol 278-280 ◽  
pp. 1174-1177 ◽  
Author(s):  
Jia Jia Miao ◽  
Guo You Chen ◽  
Le Wang ◽  
Xue Lin Fang

Microblogging has become a major tool for people to not only share information, but also to talk about current affairs. Has become the most popular content in the analysis, interested companies and researchers. We focus on the micro-blog clustering high-dimensional, high sparse, and proposed a new algorithm based on k-means-k frequent itemsets. In addition, the development of a method to capture long-term mutual information context knowledge in microblogging and algorithms are also designed to measure the conversation Similar. In order to support the new micro-blog clustering algorithm. Experimental results show that the clustering algorithm has higher accuracy than the standard k-means and two points in k-means algorithm toward large-capacity and highly sparse microblogging also maintain good scalability.


2021 ◽  
Author(s):  
Yang Xu ◽  
Priyojit Das ◽  
Rachel Patton McCord

Deep learning approaches have empowered single-cell omics data analysis in many ways, generating new insights from complex cellular systems. As there is an increasing need for single cell omics data to be integrated across sources, types, and features of data, the challenges of integrating single-cell omics data are rising. Here, we present a deep clustering algorithm that learns discriminative representation for single-cell data via maximizing mutual information, SMILE (Single-cell Mutual Information Learning). Using a unique cell-pairing design, SMILE successfully integrates multi-source single-cell transcriptome data, removing batch effects and projecting similar cell types, even from different tissues, into the same representation space. SMILE can also integrate data from two or more modalities, such as joint profiling technologies using single-cell ATAC-seq, RNA-seq, DNA methylation, Hi-C, and ChIP data. SMILE works well even when feature types are unmatched, such as genes for RNA-seq and genome wide peaks for ATAC-seq.


2007 ◽  
Vol 6 (2) ◽  
pp. 251-254 ◽  
Author(s):  
Hongfang Zhou ◽  
Boqin Feng ◽  
Lintao Lv ◽  
Hui Yue

2021 ◽  
pp. 1-13
Author(s):  
Li Yihong ◽  
Wang Yunpeng ◽  
Li Tao ◽  
Lan Xiaolong ◽  
Song Han

DBSCAN (density-based spatial clustering of applications with noise) is one of the most widely used density-based clustering algorithms, which can find arbitrary shapes of clusters, determine the number of clusters, and identify noise samples automatically. However, the performance of DBSCAN is significantly limited as it is quite sensitive to the parameters of eps and MinPts. Eps represents the eps-neighborhood and MinPts stands for a minimum number of points. Additionally, a dataset with large variations in densities will probably trap the DBSCAN because its parameters are fixed. In order to overcome these limitations, we propose a new density-clustering algorithm called GNN-DBSCAN which uses an adaptive Grid to divide the dataset and defines local core samples by using the Nearest Neighbor. With the help of grid, the dataset space will be divided into a finite number of cells. After that, the nearest neighbor lying in every filled cell and adjacent filled cells are defined as the local core samples. Then, GNN-DBSCAN obtains global core samples by enhancing and screening local core samples. In this way, our algorithm can identify higher-quality core samples than DBSCAN. Lastly, give these global core samples and use dynamic radius based on k-nearest neighbors to cluster the datasets. Dynamic radius can overcome the problems of DBSCAN caused by its fixed parameter eps. Therefore, our method can perform better on dataset with large variations in densities. Experiments on synthetic and real-world datasets were conducted. The results indicate that the average Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), Adjusted Mutual Information (AMI) and V-measure of our proposed algorithm outperform the existing algorithm DBSCAN, DPC, ADBSCAN, and HDBSCAN.


2018 ◽  
Vol 19 (8) ◽  
pp. 2267 ◽  
Author(s):  
Xia Cao ◽  
Guoxian Yu ◽  
Jie Liu ◽  
Lianyin Jia ◽  
Jun Wang

Identifying single nucleotide polymorphism (SNP) interactions is considered as a popular and crucial way for explaining the missing heritability of complex diseases in genome-wide association studies (GWAS). Many approaches have been proposed to detect SNP interactions. However, existing approaches generally suffer from the high computational complexity resulting from the explosion of candidate high-order interactions. In this paper, we propose a two-stage approach (called ClusterMI) to detect high-order genome-wide SNP interactions based on significant pairwise SNP combinations. In the screening stage, to alleviate the huge computational burden, ClusterMI firstly applies a clustering algorithm combined with mutual information to divide SNPs into different clusters. Then, ClusterMI utilizes conditional mutual information to screen significant pairwise SNP combinations in each cluster. In this way, there is a higher probability of identifying significant two-locus combinations in each group, and the computational load for the follow-up search can be greatly reduced. In the search stage, two different search strategies (exhaustive search and improved ant colony optimization search) are provided to detect high-order SNP interactions based on the cardinality of significant two-locus combinations. Extensive simulation experiments show that ClusterMI has better performance than other related and competitive approaches. Experiments on two real case-control datasets from Wellcome Trust Case Control Consortium (WTCCC) also demonstrate that ClusterMI is more capable of identifying high-order SNP interactions from genome-wide data.


Sign in / Sign up

Export Citation Format

Share Document