A New Objective Reduction Algorithm for Many-Objective Problems: Employing Mutual Information and Clustering Algorithm

There are various applications of clustering in the fields of machine learning, data mining, data compression along with pattern recognition. The existent techniques like the Llyods algorithm (sometimes called k-means) were affected by the issue of the algorithm which converges to a local optimum along with no approximation guarantee. For overcoming these shortcomings, an efficient k-means clustering approach is offered by this paper for stream data mining. Coreset is a popular and fundamental concept for k-means clustering in stream data. In each step, reduction determines a coreset of inputs, and represents the error, where P represents number of input points according to nested property of coreset. Hence, a bit reduction in error of final coreset gets n times more accurate. Therefore, this motivated the author to propose a new coreset-reduction algorithm. The proposed algorithm executed on the Covertype dataset, Spambase dataset, Census 1990 dataset, Bigcross dataset, and Tower dataset. Our algorithm outperforms with competitive algorithms like Streamkm[Formula: see text], BICO (BIRCH meets Coresets for k-means clustering), and BIRCH (Balance Iterative Reducing and Clustering using Hierarchies.

Download Full-text

Attribute weighted fuzzy clustering algorithm based on mutual information

2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) ◽

10.1109/fskd.2017.8393018 ◽

2017 ◽

Author(s):

Yao Zhu Cao ◽

He Lin ◽

Biao Liu

Keyword(s):

Mutual Information ◽

Fuzzy Clustering ◽

Clustering Algorithm ◽

Fuzzy Clustering Algorithm

Download Full-text

LZW mutual-information-maximizing input clustering algorithm

2011 4th International Conference on Biomedical Engineering and Informatics (BMEI) ◽

10.1109/bmei.2011.6098783 ◽

2011 ◽

Cited By ~ 1

Author(s):

Orawan Watchanupaporn ◽

Worasait Suwannik

Keyword(s):

Mutual Information ◽

Clustering Algorithm

Download Full-text

A proposed scheme for sentiment analysis

Kybernetes ◽

10.1108/k-06-2017-0229 ◽

2018 ◽

Vol 47 (5) ◽

pp. 957-984 ◽

Cited By ~ 6

Author(s):

Sajjad Tofighy ◽

Seyed Mostafa Fakhrahmad

Keyword(s):

Sentiment Analysis ◽

Clustering Algorithm ◽

Reduction Method ◽

Feature Reduction ◽

Sentiment Classification ◽

Support Vector ◽

Second Phase ◽

Reduction Algorithm ◽

Content Type ◽

Statistical Knowledge

Purpose This paper aims to propose a statistical and context-aware feature reduction algorithm that improves sentiment classification accuracy. Classification of reviews with different granularities in two classes of reviews with negative and positive polarities is among the objectives of sentiment analysis. One of the major issues in sentiment analysis is feature engineering while it severely affects time complexity and accuracy of sentiment classification. Design/methodology/approach In this paper, a feature reduction method is proposed that uses context-based knowledge as well as synset statistical knowledge. To do so, one-dimensional presentation proposed for SentiWordNet calculates statistical knowledge that involves polarity concentration and variation tendency for each synset. Feature reduction involves two phases. In the first phase, features that combine semantic and statistical similarity conditions are put in the same cluster. In the second phase, features are ranked and then the features which are given lower ranks are eliminated. The experiments are conducted by support vector machine (SVM), naive Bayes (NB), decision tree (DT) and k-nearest neighbors (KNN) algorithms to classify the vectors of the unigram and bigram features in two classes of positive or negative sentiments. Findings The results showed that the applied clustering algorithm reduces SentiWordNet synset to less than half which reduced the size of the feature vector by less than half. In addition, the accuracy of sentiment classification is improved by at least 1.5 per cent. Originality/value The presented feature reduction method is the first use of the synset clustering for feature reduction. In this paper features reduction algorithm, first aggregates the similar features into clusters then eliminates unsatisfactory cluster.

Download Full-text

Modified Global K-Means Clustering Algorithm Using Mutual Information

Advanced Science Letters ◽

10.1166/asl.2013.4653 ◽

2013 ◽

Vol 19 (1) ◽

pp. 212-215

Author(s):

Chang-Woo Seo ◽

Bo Kyung Cha ◽

Ryun Kyung Kim ◽

Sungchae Jeon ◽

Young Huh ◽

...

Keyword(s):

Mutual Information ◽

Clustering Algorithm

Download Full-text

A Semantic-Based Algorithm for Microblogs Clustering

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.278-280.1174 ◽

2013 ◽

Vol 278-280 ◽

pp. 1174-1177 ◽

Cited By ~ 1

Author(s):

Jia Jia Miao ◽

Guo You Chen ◽

Le Wang ◽

Xue Lin Fang

Keyword(s):

Mutual Information ◽

Clustering Algorithm ◽

Frequent Itemsets ◽

Experimental Results ◽

High Dimensional ◽

Large Capacity ◽

Current Affairs ◽

Context Knowledge ◽

Share Information

Microblogging has become a major tool for people to not only share information, but also to talk about current affairs. Has become the most popular content in the analysis, interested companies and researchers. We focus on the micro-blog clustering high-dimensional, high sparse, and proposed a new algorithm based on k-means-k frequent itemsets. In addition, the development of a method to capture long-term mutual information context knowledge in microblogging and algorithms are also designed to measure the conversation Similar. In order to support the new micro-blog clustering algorithm. Experimental results show that the clustering algorithm has higher accuracy than the standard k-means and two points in k-means algorithm toward large-capacity and highly sparse microblogging also maintain good scalability.

Download Full-text

A Mutual Information based Face Clustering Algorithm for Movies

10.1109/icme.2006.262705 ◽

2006 ◽

Cited By ~ 7

Author(s):

N. Vretos ◽

V. Solachidis ◽

I. Pitas

Keyword(s):

Mutual Information ◽

Clustering Algorithm ◽

Face Clustering

Download Full-text

SMILE: Mutual Information Learning for Integration of Single Cell Omics Data

10.1101/2021.01.28.428619 ◽

2021 ◽

Author(s):

Yang Xu ◽

Priyojit Das ◽

Rachel Patton McCord

Keyword(s):

Mutual Information ◽

Single Cell ◽

Clustering Algorithm ◽

Cell Types ◽

Cellular Systems ◽

Omics Data ◽

Learning Approaches ◽

Rna Seq ◽

Cell Transcriptome ◽

Unique Cell

Deep learning approaches have empowered single-cell omics data analysis in many ways, generating new insights from complex cellular systems. As there is an increasing need for single cell omics data to be integrated across sources, types, and features of data, the challenges of integrating single-cell omics data are rising. Here, we present a deep clustering algorithm that learns discriminative representation for single-cell data via maximizing mutual information, SMILE (Single-cell Mutual Information Learning). Using a unique cell-pairing design, SMILE successfully integrates multi-source single-cell transcriptome data, removing batch effects and projecting similar cell types, even from different tissues, into the same representation space. SMILE can also integrate data from two or more modalities, such as joint profiling technologies using single-cell ATAC-seq, RNA-seq, DNA methylation, Hi-C, and ChIP data. SMILE works well even when feature types are unmatched, such as genes for RNA-seq and genome wide peaks for ATAC-seq.

Download Full-text

Study on Mutual Information Based Clustering Algorithm

Information Technology Journal ◽

10.3923/itj.2007.251.254 ◽

2007 ◽

Vol 6 (2) ◽

pp. 251-254 ◽

Cited By ~ 1

Author(s):

Hongfang Zhou ◽

Boqin Feng ◽

Lintao Lv ◽

Hui Yue

Keyword(s):

Mutual Information ◽

Clustering Algorithm

Download Full-text

GNN-DBSCAN: A new density-based algorithm using grid and the nearest neighbor

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211922 ◽

2021 ◽

pp. 1-13

Author(s):

Li Yihong ◽

Wang Yunpeng ◽

Li Tao ◽

Lan Xiaolong ◽

Song Han

Keyword(s):

Mutual Information ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Spatial Clustering ◽

Clustering Algorithms ◽

Adjusted Rand Index ◽

K Nearest Neighbors ◽

Normalized Mutual Information ◽

Core Samples ◽

Real World Datasets

DBSCAN (density-based spatial clustering of applications with noise) is one of the most widely used density-based clustering algorithms, which can find arbitrary shapes of clusters, determine the number of clusters, and identify noise samples automatically. However, the performance of DBSCAN is significantly limited as it is quite sensitive to the parameters of eps and MinPts. Eps represents the eps-neighborhood and MinPts stands for a minimum number of points. Additionally, a dataset with large variations in densities will probably trap the DBSCAN because its parameters are fixed. In order to overcome these limitations, we propose a new density-clustering algorithm called GNN-DBSCAN which uses an adaptive Grid to divide the dataset and defines local core samples by using the Nearest Neighbor. With the help of grid, the dataset space will be divided into a finite number of cells. After that, the nearest neighbor lying in every filled cell and adjacent filled cells are defined as the local core samples. Then, GNN-DBSCAN obtains global core samples by enhancing and screening local core samples. In this way, our algorithm can identify higher-quality core samples than DBSCAN. Lastly, give these global core samples and use dynamic radius based on k-nearest neighbors to cluster the datasets. Dynamic radius can overcome the problems of DBSCAN caused by its fixed parameter eps. Therefore, our method can perform better on dataset with large variations in densities. Experiments on synthetic and real-world datasets were conducted. The results indicate that the average Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), Adjusted Mutual Information (AMI) and V-measure of our proposed algorithm outperform the existing algorithm DBSCAN, DPC, ADBSCAN, and HDBSCAN.

Download Full-text