EXTRACTING WEB USER PROFILES USING RELATIONAL COMPETITIVE FUZZY CLUSTERING

OLFA NASRAOUI; HICHEM FRIGUI; RAGHU KRISHNAPURAM; ANUPAM JOSHI

doi:10.1142/s021821300000032x

EXTRACTING WEB USER PROFILES USING RELATIONAL COMPETITIVE FUZZY CLUSTERING

International Journal of Artificial Intelligence Tools ◽

10.1142/s021821300000032x ◽

2000 ◽

Vol 09 (04) ◽

pp. 509-526 ◽

Cited By ~ 80

Author(s):

OLFA NASRAOUI ◽

HICHEM FRIGUI ◽

RAGHU KRISHNAPURAM ◽

ANUPAM JOSHI

Keyword(s):

Clustering Algorithm ◽

Distance Measure ◽

Similarity Measures ◽

Unsupervised Classification ◽

Optimal Number ◽

Relational Data ◽

Data Card ◽

User Profiles ◽

Clustering Methods ◽

Access Logs

The proliferation of information on the World Wide Web has made the personalization of this information space a necessity. An important component of Web personalization is to mine typical user profiles from the vast amount of historical data stored in access logs. In the absence of any a priori knowledge, unsupervised classification or clustering methods seem to be ideally suited to analyze the semi-structured log data of user accesses. In this paper, we define the notion of a "user session" as being a temporally compact sequence of Web accesses by a user. We also define a new distance measure between two Web sessions that captures the organization of a Web site. The Competitive Agglomeration clustering algorithm which can automatically cluster data into the optimal number of components is extended so that it can work on relational data. The resulting Competitive Agglomeration for Relational Data (CARD) algorithm can deal with complex, non-Euclidean, distance/similarity measures. This algorithm was used to analyze Web server access logs successfully and obtain typical session profiles of users.

Download Full-text

Clustering Methods Using Distance-Based Similarity Measures of Single-Valued Neutrosophic Sets

Journal of Intelligent Systems ◽

10.1515/jisys-2013-0091 ◽

2014 ◽

Vol 23 (4) ◽

pp. 379-389 ◽

Cited By ~ 45

Author(s):

Jun Ye

Keyword(s):

Machine Learning ◽

Data Mining ◽

Fuzzy Sets ◽

Clustering Algorithm ◽

Distance Measure ◽

Similarity Measures ◽

Intuitionistic Fuzzy Sets ◽

Clustering Methods ◽

Neutrosophic Sets ◽

Generalized Distance

AbstractClustering plays an important role in data mining, pattern recognition, and machine learning. Single-valued neutrosophic sets (SVNSs) are useful means to describe and handle indeterminate and inconsistent information that fuzzy sets and intuitionistic fuzzy sets cannot describe and deal with. To cluster the data represented by single-valued neutrosophic information, this article proposes single-valued neutrosophic clustering methods based on similarity measures between SVNSs. First, we define a generalized distance measure between SVNSs and propose two distance-based similarity measures of SVNSs. Then, we present a clustering algorithm based on the similarity measures of SVNSs to cluster single-valued neutrosophic data. Finally, an illustrative example is given to demonstrate the application and effectiveness of the developed clustering methods.

Download Full-text

Hard c-Means Using Quadratic Penalty-Vector Regularization for Uncertain Data

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2012.p0831 ◽

2012 ◽

Vol 16 (7) ◽

pp. 831-840 ◽

Cited By ~ 1

Author(s):

Yasunori Endo ◽

◽

Arisa Taniguchi ◽

Yukihiro Hamasuna ◽

◽

...

Keyword(s):

Missing Values ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Uncertain Data ◽

Unsupervised Classification ◽

Real Space ◽

Clustering Methods ◽

Cluster Number ◽

Numerical Examples ◽

Classification Technique

Clustering is an unsupervised classification technique for data analysis. In general, each datum in real space is transformed into a point in a pattern space to apply clustering methods. Data cannot often be represented by a point, however, because of its uncertainty, e.g., measurement error margin and missing values in data. In this paper, we will introduce quadratic penalty-vector regularization to handle such uncertain data using Hard c-Means (HCM), which is one of the most typical clustering algorithms. We first propose a new clustering algorithm called hard c-means using quadratic penalty-vector regularization for uncertain data (HCMP). Second, we propose sequential extraction hard c-means using quadratic penalty-vector regularization (SHCMP) to handle datasets whose cluster number is unknown. Furthermore, we verify the effectiveness of our proposed algorithms through numerical examples.

Download Full-text

A MULTI-CLUSTERING FUSION SCHEME FOR DATA PARTITIONING

International Journal of Neural Systems ◽

10.1142/s0129065705000360 ◽

2005 ◽

Vol 15 (05) ◽

pp. 391-401 ◽

Cited By ~ 3

Author(s):

DIMITRIOS S. FROSSYNIOTIS ◽

CHRISTOS PATERITSAS ◽

ANDREAS STAFYLOPATIS

Keyword(s):

Clustering Algorithm ◽

Optimal Number ◽

Data Partitioning ◽

Clustering Methods ◽

Data Set ◽

Fusion Procedure ◽

Distinct Partition ◽

Fusion Scheme ◽

Previous Phase ◽

Optimal Number Of Clusters

A multi-clustering fusion method is presented based on combining several runs of a clustering algorithm resulting in a common partition. More specifically, the results of several independent runs of the same clustering algorithm are appropriately combined to obtain a distinct partition of the data which is not affected by initialization and overcomes the instabilities of clustering methods. Subsequently, a fusion procedure is applied to the clusters generated during the previous phase to determine the optimal number of clusters in the data set according to some predefined criteria.

Download Full-text

Simulation Study on Clustering Approaches for Short-Term Electricity Forecasting

Complexity ◽

10.1155/2018/3683969 ◽

2018 ◽

Vol 2018 ◽

pp. 1-21 ◽

Cited By ~ 13

Author(s):

Krzysztof Gajowniczek ◽

Tomasz Ząbkowski

Keyword(s):

Time Series ◽

Similarity Measures ◽

Optimal Number ◽

Clustering Methods ◽

Smart Meters ◽

Practical Applications ◽

Electricity Use ◽

Residential Electricity ◽

Using Data ◽

Advanced Metering

Advanced metering infrastructures such as smart metering have begun to attract increasing attention; a considerable body of research is currently focusing on load profiling and forecasting at different scales on the grid. Electricity time series clustering is an effective tool for identifying useful information in various practical applications, including the forecasting of electricity usage, which is important for providing more data to smart meters. This paper presents a comprehensive study of clustering methods for residential electricity demand profiles and further applications focused on the creation of more accurate electricity forecasts for residential customers. The contributions of this paper are threefold: (1) using data from 46 homes in Austin, Texas, the similarity measures from different time series are analyzed; (2) the optimal number of clusters for representing residential electricity use profiles is determined; and (3) an extensive load forecasting study using different segmentation-enhanced forecasting algorithms is undertaken. Finally, from the operator’s perspective, the implications of the results are discussed in terms of the use of clustering methods for grouping electrical load patterns.

Download Full-text

Clustering on Human Microbiome Sequencing Data: A Distance-Based Unsupervised Learning Model

Microorganisms ◽

10.3390/microorganisms8101612 ◽

2020 ◽

Vol 8 (10) ◽

pp. 1612

Author(s):

Dongyang Yang ◽

Wei Xu

Keyword(s):

Beta Diversity ◽

Clustering Algorithm ◽

Distance Measure ◽

Human Microbiome ◽

Operational Taxonomic Unit ◽

Distance Measures ◽

Clustering Methods ◽

Sequencing Data ◽

Diversity Measure ◽

Parkinson’S Diseases

Modeling and analyzing human microbiome allows the assessment of the microbial community and its impacts on human health. Microbiome composition can be quantified using 16S rRNA technology into sequencing data, which are usually skewed and heavy-tailed with excess zeros. Clustering methods are useful in personalized medicine by identifying subgroups for patients stratification. However, there is currently a lack of standardized clustering method for the complex microbiome sequencing data. We propose a clustering algorithm with a specific beta diversity measure that can address the presence-absence bias encountered for sparse count data and effectively measure the sample distances for sample stratification. Our distance measure used for clustering is derived from a parametric based mixture model producing sample-specific distributions conditional on the observed operational taxonomic unit (OTU) counts and estimated mixture weights. The method can provide accurate estimates of the true zero proportions and thus construct a precise beta diversity measure. Extensive simulation studies have been conducted and suggest that the proposed method achieves substantial clustering improvement compared with some widely used distance measures when a large proportion of zeros is presented. The proposed algorithm was implemented to a human gut microbiome study on Parkinson’s diseases to identify distinct microbiome states with biological interpretations.

Download Full-text

Kernel-Based Robust Bias-Correction Fuzzy Weighted C-Ordered-Means Clustering Algorithm

Symmetry ◽

10.3390/sym11060753 ◽

2019 ◽

Vol 11 (6) ◽

pp. 753

Author(s):

Wenyuan Zhang ◽

Xijuan Guo ◽

Tianyu Huang ◽

Jiale Liu ◽

Jun Chen

Keyword(s):

Bias Correction ◽

Euclidean Distance ◽

Clustering Algorithm ◽

Distance Measure ◽

Similarity Measures ◽

Distance Measures ◽

Background Information ◽

Local Similarity ◽

Original Algorithm ◽

Fcm Clustering

The spatial constrained Fuzzy C-means clustering (FCM) is an effective algorithm for image segmentation. Its background information improves the insensitivity to noise to some extent. In addition, the membership degree of Euclidean distance is not suitable for revealing the non-Euclidean structure of input data, since it still lacks enough robustness to noise and outliers. In order to overcome the problem above, this paper proposes a new kernel-based algorithm based on the Kernel-induced Distance Measure, which we call it Kernel-based Robust Bias-correction Fuzzy Weighted C-ordered-means Clustering Algorithm (KBFWCM). In the construction of the objective function, KBFWCM algorithm comprehensively takes into account that the spatial constrained FCM clustering algorithm is insensitive to image noise and involves a highly intensive computation. Aiming at the insensitivity of spatial constrained FCM clustering algorithm to noise and its image detail processing, the KBFWCM algorithm proposes a comprehensive algorithm combining fuzzy local similarity measures (space and grayscale) and the typicality of data attributes. Aiming at the poor robustness of the original algorithm to noise and outliers and its highly intensive computation, a Kernel-based clustering method that includes a class of robust non-Euclidean distance measures is proposed in this paper. The experimental results show that the KBFWCM algorithm has a stronger denoising and robust effect on noise image.

Download Full-text

Comparison of Fuzzy Clustering Methods and Their Applications to Geophysics Data

Applied Computational Intelligence and Soft Computing ◽

10.1155/2009/876361 ◽

2009 ◽

Vol 2009 ◽

pp. 1-16 ◽

Cited By ~ 4

Author(s):

David J. Miller ◽

Carl A. Nelson ◽

Molly Boeka Cannon ◽

Kenneth P. Cannon

Keyword(s):

Fuzzy Clustering ◽

Real World ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Optimal Number ◽

Optimum Number ◽

Clustering Methods ◽

Real World Data ◽

Data Set ◽

World Data

Fuzzy clustering algorithms are helpful when there exists a dataset with subgroupings of points having indistinct boundaries and overlap between the clusters. Traditional methods have been extensively studied and used on real-world data, but require users to have some knowledge of the outcome a priori in order to determine how many clusters to look for. Additionally, iterative algorithms choose the optimal number of clusters based on one of several performance measures. In this study, the authors compare the performance of three algorithms (fuzzy c-means, Gustafson-Kessel, and an iterative version of Gustafson-Kessel) when clustering a traditional data set as well as real-world geophysics data that were collected from an archaeological site in Wyoming. Areas of interest in the were identified using a crisp cutoff value as well as a fuzzyα-cut to determine which provided better elimination of noise and non-relevant points. Results indicate that theα-cut method eliminates more noise than the crisp cutoff values and that the iterative version of the fuzzy clustering algorithm is able to select an optimum number of subclusters within a point set (in both the traditional and real-world data), leading to proper indication of regions of interest for further expert analysis

Download Full-text

Automatic group-wise whole-brain short association fiber bundle labeling based on clustering and cortical surface information

10.21203/rs.2.20420/v1 ◽

2020 ◽

Author(s):

Andrea Vázquez ◽

Narciso López-López ◽

Josselin Houenou ◽

Cyril Poupon ◽

Jean-François Mangin ◽

...

Keyword(s):

White Matter ◽

Execution Time ◽

Fiber Bundle ◽

Clustering Algorithm ◽

Distance Measure ◽

Clustering Methods ◽

Good Correspondence ◽

Hungarian Algorithm ◽

Brain White Matter ◽

Fiber Clustering

Abstract Background: Diffusion MRI is the preferred non-invasive in vivo modality for the study of brain white matter connections. Tractography datasets contain 3D streamlines that can be analyzed to study the main brain white matter tracts. Fiber clustering methods have been used to automatically regroup similar fibers into clusters. However, due to inter-subject variability and artifacts, the resulting clusters are difficult to process for finding common connections across subjects, specially for superficial white matter. Methods: We present an automatic method for labeling of short association bundles on a group of subjects. The method is based on an intra-subject fiber clustering that generates compact fiber clusters. Posteriorly, the clusters are labeled based on the cortical connectivity of the fibers, taking as reference the Desikan-Killiany atlas, and named according to their relative position along one axis. Finally, two different strategies were applied and compared for the labeling of inter-subject bundles: a matching with the Hungarian algorithm, and a well-known fiber clustering algorithm, called QuickBundles. Results: Individual labeling was executed over four subjects, with an execution time of 3.6 minutes. An inspection of individual labeling based on a distance measure, showed good correspondence among the four tested subjects. Two inter-subject labeling were successfully implemented and applied to 20 subjects, and compared using a set of distance thresholds, ranging from a conservative value of 10 mm to a moderate value of 21 mm. Hungarian algorithm led to high correspondence, but low reproducibility for all the thresholds, with 96 seconds of execution time. QuickBundles led to better correspondence, reproducibility and short execution time of 9 seconds. Hence, the whole processing for the inter-subject labeling over 20 subjects takes 1.17 hours. Conclusion: We implemented a method for the automatic labeling of short bundles in individuals, based on an intra-subject clustering and the connectivity of the clusters with the cortex. The labels provide useful information for the visualization and analysis of individual connections, what is very difficult without any additional information. Furthermore, we provide two fast inter-subject bundle labeling methods. The obtained clusters could be used for performing manual or automatic connectivity analysis in individuals or across subjects. Keywords: fiber labeling; clustering; fiber bundle; tractography; superficial white matter

Download Full-text

RBF Neural Network (RBFNN) using Density Based Clustering for Liver Disorder Dataset

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i7.91 ◽

2017 ◽

Vol 7 (7) ◽

pp. 20

Author(s):

Sunila Godara ◽

Rishipal Singh ◽

Sanjeev Kumar

Keyword(s):

Neural Network ◽

Common Property ◽

Clustering Algorithm ◽

Distance Measure ◽

Rbf Neural Network ◽

Unsupervised Classification ◽

Liver Disorder ◽

Data Set ◽

Density Based Clustering

Clustering is an unsupervised classification that is the partitioning of a data set in a set of meaningful subsets. Each object in dataset shares some common property- often proximity according to some defined distance measure. In this paper we will extend our previous work [15]. Simple K-means and Proposed makeDensityBased Clustering (MDBC) are embedded in RBF Neural Network (RBFNN). We evaluated the performance of RBFNN using K-Means and Proposed makeDensityBased Clustering on Liver Disorder Dataset. Proposed algorithm is superior to the existing makeDensityBased Clustering algorithm [15], but it is not capable of performing well when it is embedded with RBFNN.

Download Full-text

Complexity Optimized Data Clustering by Competitive Neural Networks

Neural Computation ◽

10.1162/neco.1993.5.1.75 ◽

1993 ◽

Vol 5 (1) ◽

pp. 75-88 ◽

Cited By ~ 48

Author(s):

Joachim Buhmann ◽

Hans Kühnel

Keyword(s):

Neural Networks ◽

Data Storage ◽

Speech Processing ◽

Data Clustering ◽

Clustering Algorithm ◽

Data Representation ◽

Optimal Number ◽

Clustering Methods ◽

Feature Maps ◽

Maximum Entropy Estimation

Data clustering is a complex optimization problem with applications ranging from vision and speech processing to data transmission and data storage in technical as well as in biological systems. We discuss a clustering strategy that explicitly reflects the tradeoff between simplicity and precision of a data representation. The resulting clustering algorithm jointly optimizes distortion errors and complexity costs. A maximum entropy estimation of the clustering cost function yields an optimal number of clusters, their positions, and their cluster probabilities. Our approach establishes a unifying framework for different clustering methods like K-means clustering, fuzzy clustering, entropy constrained vector quantization, or topological feature maps and competitive neural networks.

Download Full-text