Bayesian-OverDBC: A Bayesian Density-Based Approach for Modeling Overlapping Clusters

Mathematical Problems in Engineering ◽

10.1155/2015/187053 ◽

2015 ◽

Vol 2015 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Mansooreh Mirzaie ◽

Ahmad Barani ◽

Naser Nematbakkhsh ◽

Majid Mohammad-Beigi

Keyword(s):

Probability Model ◽

Clustering Algorithms ◽

Large Data ◽

Clustering Methods ◽

Overlapping Clusters ◽

Density Based Clustering ◽

Real World Applications ◽

Gene Regulatory ◽

Large Data Analysis ◽

Better Than

Although most research in density-based clustering algorithms focused on finding distinct clusters, many real-world applications (such as gene functions in a gene regulatory network) have inherently overlapping clusters. Even with overlapping features, density-based clustering methods do not define a probabilistic model of data. Therefore, it is hard to determine how “good” clustering, predicting, and clustering new data into existing clusters are. Therefore, a probability model for overlap density-based clustering is a critical need for large data analysis. In this paper, a new Bayesian density-based method (Bayesian-OverDBC) for modeling the overlapping clusters is presented. Bayesian-OverDBC can predict the formation of a new cluster. It can also predict the overlapping of cluster with existing clusters. Bayesian-OverDBC has been compared with other algorithms (nonoverlapping and overlapping models). The results show that Bayesian-OverDBC can be significantly better than other methods in analyzing microarray data.

Download Full-text

IDCUP Algorithm to Classifying Arbitrary Shapes and Densities for Center-based Clustering Performance Analysis

Interdisciplinary Journal of Information Knowledge and Management ◽

10.28945/4541 ◽

2020 ◽

Vol 15 ◽

pp. 091-108

Author(s):

Saud Altaf ◽

Muhammad Waseem Waseem ◽

Laila Kazmi

Keyword(s):

Knowledge Discovery ◽

Arbitrary Shape ◽

Hybrid Approach ◽

Large Data ◽

Data Sets ◽

Complex Data ◽

Discovery Process ◽

Clustering Methods ◽

Density Based Clustering ◽

Arbitrary Shapes

Aim/Purpose: The clustering techniques are normally considered to determine the significant and meaningful subclasses purposed in datasets. It is an unsupervised type of Machine Learning (ML) where the objective is to form groups from objects based on their similarity and used to determine the implicit relationships between the different features of the data. Cluster Analysis is considered a significant problem area in data exploration when dealing with arbitrary shape problems in different datasets. Clustering on large data sets has the following challenges: (1) clusters with arbitrary shapes; (2) less knowledge discovery process to decide the possible input features; (3) scalability for large data sizes. Density-based clustering has been known as a dominant method for determining the arbitrary-shape clusters. Background: Existing density-based clustering methods commonly cited in the literature have been examined in terms of their behavior with data sets that contain nested clusters of varying density. The existing methods are not enough or ideal for such data sets, because they typically partition the data into clusters that cannot be nested. Methodology: A density-based approach on traditional center-based clustering is introduced that assigns a weight to each cluster. The weights are then utilized in calculating the distances from data vectors to centroids by multiplying the distance by the centroid weight. Contribution: In this paper, we have examined different density-based clustering methods for data sets with nested clusters of varying density. Two such data sets were used to evaluate some of the commonly cited algorithms found in the literature. Nested clusters were found to be challenging for the existing algorithms. In utmost cases, the targeted algorithms either did not detect the largest clusters or simply divided large clusters into non-overlapping regions. But, it may be possible to detect all clusters by doing multiple runs of the algorithm with different inputs and then combining the results. This work considered three challenges of clustering methods. Findings: As a result, a center with a low weight will attract objects from further away than a centroid with higher weight. This allows dense clusters inside larger clusters to be recognized. The methods are tested experimentally using the K-means, DBSCAN, TURN*, and IDCUP algorithms. The experimental results with different data sets showed that IDCUP is more robust and produces better clusters than DBSCAN, TURN*, and K-means. Finally, we compare K-means, DBSCAN, TURN*, and to deal with arbitrary shapes problems at different datasets. IDCUP shows better scalability compared to TURN*. Future Research: As future recommendations of this research, we are concerned with the exploration of further available challenges of the knowledge discovery process in clustering along with complex data sets with more time. A hybrid approach based on density-based and model-based clustering algorithms needs to compare to achieve maximum performance accuracy and avoid the arbitrary shapes related problems including optimization. It is anticipated that the comparable kind of the future suggested process will attain improved performance with analogous precision in identification of clustering shapes.

Download Full-text

Evaluating Clustering Algorithms for Identifying Design Subproblems

Journal of Mechanical Design ◽

10.1115/1.4040176 ◽

2018 ◽

Vol 140 (8) ◽

Cited By ~ 3

Author(s):

Jeffrey W. Herrmann ◽

Michael Morency ◽

Azrah Anparasan ◽

Erica L. Gralla

Keyword(s):

Association Rule ◽

Euclidean Distance ◽

Clustering Algorithms ◽

Design Teams ◽

Clustering Methods ◽

Design Problems ◽

Design Variables ◽

Complex Design ◽

Synthetic Datasets ◽

Better Than

Understanding how humans decompose design problems will yield insights that can be applied to develop better support for human designers. However, there are few established methods for identifying the decompositions that human designers use. This paper discusses a method for identifying subproblems by analyzing when design variables were discussed concurrently by human designers. Four clustering techniques for grouping design variables were tested on a range of synthetic datasets designed to resemble data collected from design teams, and the accuracy of the clusters created by each algorithm was evaluated. A spectral clustering method was accurate for most problems and generally performed better than hierarchical (with Euclidean distance metric), Markov, or association rule clustering methods. The method's success should enable researchers to gain new insights into how human designers decompose complex design problems.

Download Full-text

Exploring performance of clustering methods on document sentiment analysis

Journal of Information Science ◽

10.1177/0165551515617374 ◽

2016 ◽

Vol 43 (1) ◽

pp. 54-74 ◽

Cited By ~ 14

Author(s):

Baojun Ma ◽

Hua Yuan ◽

Ye Wu

Keyword(s):

Sentiment Analysis ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Experimental Studies ◽

Experimental Results ◽

Clustering Methods ◽

Term Weighting ◽

Weighting Method ◽

Clustering Techniques ◽

Better Than

Clustering is a powerful unsupervised tool for sentiment analysis from text. However, the clustering results may be affected by any step of the clustering process, such as data pre-processing strategy, term weighting method in Vector Space Model and clustering algorithm. This paper presents the results of an experimental study of some common clustering techniques with respect to the task of sentiment analysis. Different from previous studies, in particular, we investigate the combination effects of these factors with a series of comprehensive experimental studies. The experimental results indicate that, first, the K-means-type clustering algorithms show clear advantages on balanced review datasets, while performing rather poorly on unbalanced datasets by considering clustering accuracy. Second, the comparatively newly designed weighting models are better than the traditional weighting models for sentiment clustering on both balanced and unbalanced datasets. Furthermore, adjective and adverb words extraction strategy can offer obvious improvements on clustering performance, while strategies of adopting stemming and stopword removal will bring negative influences on sentiment clustering. The experimental results would be valuable for both the study and usage of clustering methods in online review sentiment analysis.

Download Full-text

Density-based clustering of static and dynamic functional MRI connectivity features obtained from subjects with cognitive impairment

Brain Informatics ◽

10.1186/s40708-020-00120-2 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

D. Rangaprakash ◽

◽

Toluwanimi Odemuyiwa ◽

D. Narayana Dutt ◽

Gopikrishna Deshpande

Keyword(s):

Cognitive Impairment ◽

Brain Connectivity ◽

Clustering Algorithms ◽

Training Data ◽

Clustering Methods ◽

Machine Learning Classification ◽

Density Based Clustering ◽

Supervised Classifiers ◽

Neuroimaging Data ◽

Robustness To Noise

AbstractVarious machine-learning classification techniques have been employed previously to classify brain states in healthy and disease populations using functional magnetic resonance imaging (fMRI). These methods generally use supervised classifiers that are sensitive to outliers and require labeling of training data to generate a predictive model. Density-based clustering, which overcomes these issues, is a popular unsupervised learning approach whose utility for high-dimensional neuroimaging data has not been previously evaluated. Its advantages include insensitivity to outliers and ability to work with unlabeled data. Unlike the popular k-means clustering, the number of clusters need not be specified. In this study, we compare the performance of two popular density-based clustering methods, DBSCAN and OPTICS, in accurately identifying individuals with three stages of cognitive impairment, including Alzheimer’s disease. We used static and dynamic functional connectivity features for clustering, which captures the strength and temporal variation of brain connectivity respectively. To assess the robustness of clustering to noise/outliers, we propose a novel method called recursive-clustering using additive-noise (R-CLAN). Results demonstrated that both clustering algorithms were effective, although OPTICS with dynamic connectivity features outperformed in terms of cluster purity (95.46%) and robustness to noise/outliers. This study demonstrates that density-based clustering can accurately and robustly identify diagnostic classes in an unsupervised way using brain connectivity.

Download Full-text

Research on the Application of Data Mining Technology in the Analysis of College Students’ Sports Psychology

Mobile Information Systems ◽

10.1155/2021/6529174 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Shujun Hou

Keyword(s):

College Students ◽

Data Mining ◽

College Athletics ◽

Data Extraction ◽

Clustering Algorithms ◽

Sports Psychology ◽

Clustering Methods ◽

Use Of Data ◽

Use Of Technology ◽

Density Based Clustering

The advent of the information age has changed every existing career and revolutionized most if not all fields, notwithstanding many benefits that came along with it. There has been an exponential rise in information and, alongside it, an increase in data. Data centers have erupted with details as the number of rows in databases grows by the day. The use of technology has nevertheless become essential in many company models and organizations, warranting its usage in virtually every channel. College physical education and sports are not an exception as students studying such subjects are skyrocketing. As the information is getting more complex, improved methods are needed to research and analyze data. Fortunately, data mining has come to the rescue. Data mining is a collection of analytical methods and procedures used exclusively for the sake of data extraction. It may be used to analyze features and trends from vast quantities of data. The objective of this study is to explore the use of data mining technologies in the analysis of college students’ sports psychology. This study uses clustering methods for the examination of sports psychology. We utilize three clustering methods for this aim: expectation-maximization (EM) algorithm, k-means, COBWEB, density-based clustering of applications with noise (DBSCAN), and agglomerative hierarchal clustering algorithms. We perform our forecasts based on various metrics combined with the past outcomes of college sports using these methods. In contrast to conventional data research and analysis techniques, our approaches have relatively high prediction accuracy as far as college athletics is concerned.

Download Full-text

A Comprehensive Study on the Importance of the Elbow and the Silhouette Metrics in Cluster Count Prediction for Partition Cluster Models

Revista Gestão Inovação e Tecnologias ◽

10.47059/revistageintec.v11i4.2408 ◽

2021 ◽

Vol 11 (4) ◽

pp. 3792-3806

Author(s):

A.A. Abdulnassar ◽

Latha R. Nair

Keyword(s):

Clustering Algorithms ◽

Large Data ◽

Cluster Models ◽

Data Repository ◽

Data Sets ◽

Analysis Tool ◽

Clustering Methods ◽

K Value ◽

Statistical Analysis Tool ◽

Partition Clustering

Proper selection of cluster count gives better clustering results in partition models. Partition clustering methods are very simple as well as efficient. Kmeans and its modified versions are very efficient cluster models and the results are very sensitive to the chosen K value. The partition clustering algorithms are more suitable in applications where the data are arranged in a uniform manner. This work aims to evaluate the importance of assigning cluster count value in order to improve the efficiency of partition clustering algorithms using two well known statistical methods, the Elbow method and the Silhouette method. The performance of the Silhouette method and Elbow method are compared with different data sets from the UCI data repository. The values obtained using these methods are compared with the results of cluster performance obtained using the statistical analysis tool Weka on the selected data sets. Performance was evaluated on cluster efficiency for small and large data sets by varying the cluster count values. Similar results obtained from the three methods, the Elbow method, the Silhouette method and the clustering by Weka. It was also observed that the fast reduction in clustering efficiency for small changes in cluster count when the cluster count is small.

Download Full-text

Introduction to Clustering

Dynamic and Advanced Data Mining for Progressing Technological Development ◽

10.4018/978-1-60566-908-3.ch010 ◽

2010 ◽

pp. 224-254

Author(s):

Raymond Greenlaw ◽

Sanpawat Kantabutra

Keyword(s):

Clustering Algorithms ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Clustering Methods ◽

Research Directions ◽

History Of ◽

Representative Points ◽

Parallel Clustering ◽

Extensive List

This chapter provides the reader with an introduction to clustering algorithms and applications. A number of important well-known clustering methods are surveyed. The authors present a brief history of the development of the field of clustering, discuss various types of clustering, and mention some of the current research directions in the field of clustering. Algorithms are described for top-down and bottom-up hierarchical clustering, as are algorithms for K-Means clustering and for K-Medians clustering. The technique of representative points is also presented. Given the large data sets involved with clustering, the need to apply parallel computing to clustering arises, so they discuss issues related to parallel clustering as well. Throughout the chapter references are provided to works that contain a large number of experimental results. A comparison of the various clustering methods is given in tabular format. They conclude the chapter with a summary and an extensive list of references.

Download Full-text

Ensemble of Community Detection in Social Networks Using Clustering of Users Demographic and Topological Information

Current Chinese Computer Science ◽

10.2174/2665997201999200407120239 ◽

2020 ◽

Vol 01 ◽

Author(s):

Amin Rezaeipanah ◽

Kambiz Ghanat

Keyword(s):

Social Networks ◽

Social Network ◽

Community Detection ◽

Clustering Algorithms ◽

Clustering Methods ◽

Demographic Information ◽

Swarm Optimization ◽

Topological Information ◽

Quantitative Evaluations ◽

Better Than

Background: One of the great challenges in social network analysis is community detection. Community is a group of users which have high intra connections and sparse inter connections. Community detection or Clustering reveals community structure of social networks and hidden relationships among their constituents. Nowadays, many different methods are proposed to detect community structures in social networks from different perspective, but none of them can be a constant winner. Therefore, ensemble different methods can potentially improve the final result. Methods: In this paper, we present a framework for different methods to be combined for community detection. This method is a combination of genetic algorithms, particle swarm optimization, k-means clustering and Louvain clustering algorithms. Our method uses topological and demographic information to identify communities and can automatically determine the number of optimal communities. Results: Quantitative evaluations based on extensive experiments on Ego-Facebook social network dataset reveals that the method presented in this study achieves favorable results which are quite superior to other relevant algorithms in the literature.  Discovering relationships between individuals by analyzing social networks.  Providing identifying communities algorithms based on different clustering methods.  An ensemble of community detection consisting of GA, PSO, k-means and Louvain clustering.  The proposed method is better than the TSA method at silhouette and modularity criterion.  Demographic information also relates to the profile of users and their shared tweets.

Download Full-text

Neutrosophic Clustering Algorithm Based on Sparse Regular Term Constraint

Complexity ◽

10.1155/2021/6657849 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Dan Zhang ◽

Yingcang Ma ◽

Hu Zhao ◽

Xiaofei Yang

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Large Data ◽

Real Data ◽

Large Data Sets ◽

Data Sets ◽

Artificial Data ◽

Research Topics ◽

Regularization Parameters ◽

Better Than

Clustering algorithm is one of the important research topics in the field of machine learning. Neutrosophic clustering is the generalization of fuzzy clustering and has been applied to many fields. This paper presents a new neutrosophic clustering algorithm with the help of regularization. Firstly, the regularization term is introduced into the FC-PFS algorithm to generate sparsity, which can reduce the complexity of the algorithm on large data sets. Secondly, we propose a method to simplify the process of determining regularization parameters. Finally, experiments show that the clustering results of this algorithm on artificial data sets and real data sets are mostly better than other clustering algorithms. Our clustering algorithm is effective in most cases.

Download Full-text

A system for relation-oriented faceted search over knowledge bases

International Journal of Web Information Systems ◽

10.1108/ijwis-03-2021-0035 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Taro Aso ◽

Toshiyuki Amagasa ◽

Hiroyuki Kitagawa

Keyword(s):

User Study ◽

Clustering Algorithms ◽

Tensor Decomposition ◽

Decomposition Methods ◽

Knowledge Bases ◽

Clustering Methods ◽

Content Type ◽

Faceted Search ◽

Wide Range ◽

Better Than

Purpose The purpose of this paper is to propose a scheme that allows users to interactively explore relations between entities in knowledge bases (KBs). KBs store a wide range of knowledge about real-world entities in a structured form as (subject, predicate, object). Although it is possible to query entities and relations among entities by specifying appropriate query expressions of SPARQL or keyword queries, the structure and the vocabulary are complicated, and it is hard for non-expert users to get the desired information. For this reason, many researchers have proposed faceted search interfaces for KBs. Nevertheless, existing ones are designed for finding entities and are insufficient for finding relations. Design/methodology/approach To this problem, the authors propose a novel “relation facet” to find relations between entities. To generate it, they applied clustering on predicates for grouping those predicates that are connected to common objects. Having generated clusters of predicates, the authors generated a facet according to the result. Specifically, they proposed to use a couple of clustering algorithms, namely, agglomerative hierarchical clustering (AHC) and CANDECOMP/PARAFAC (CP) tensor decomposition which is one of the tensor decomposition methods. Findings The authors experimentally show test the performance of clustering methods and found that AHC performs better than tensor decomposition. Besides, the authors conducted a user study and show that their proposed scheme performs better than existing ones in the task of searching relations. Originality/value The authors propose a relation-oriented faceted search method for KBs that allows users to explore relations between entities. As far as the authors know, this is the first method to focus on the exploration of relations between entities.

Download Full-text