An Algorithm of Web Text Clustering Analysis Based on Fuzzy Set

Author(s):  
Yun Peng ◽  
Shu-liang Ding
2014 ◽  
Vol 678 ◽  
pp. 19-22
Author(s):  
Hong Xin Wan ◽  
Yun Peng

Web text exists non-certain and non-structure contents ,and it is difficult to cluster the text by normal classification methods. We propose a web text clustering algorithm based on fuzzy set to increase the computing accuracy with the web text. After abstracting the key words of the text, we can look it as attributes and design the fuzzy algorithm to decide the membership of the words. The algorithm can improve the algorithm complexity of time and space, increase the robustness comparing to the normal algorithm. To test the accuracy and efficiency of the algorithm, we take the comparative experiment between pattern clustering and our algorithm. The experiment shows that our method has a better result.


2013 ◽  
Vol 278-280 ◽  
pp. 1287-1291 ◽  
Author(s):  
Hong Xin Wan ◽  
Yun Peng

A fuzzy algorithm of customers evaluation based on attributes reduction is presented. The evaluation from the data objects based on key attributes can reduce the data size and algorithm complexity. After Clustering analysis of customers, then the evaluation analysis will process to the clustering data. There are a lot of uncertain data of customer cluster, so the traditional method of classification and evaluation to the incomplete data is very difficult. Superposition evaluation algorithm based on fuzzy set can improve the reliability and accuracy of e-commerce customer evaluation. Evaluation of the e-commerce customer also can improve efficiency, service quality and profitability of e-commerce businesses.


2018 ◽  
Vol 7 (4.11) ◽  
pp. 246
Author(s):  
N. M. Ariff ◽  
M. A. A. Bakar ◽  
M. I. Rahmad

Text clustering is a data mining technique that is becoming more important in present studies. Document clustering makes use of text clustering to divide documents according to the various topics. The choice of words in document clustering is important to ensure that the document can be classified correctly. Three different methods of clustering which are hierarchical clustering, k-means and k-medoids are used and compared in this study in order to identify the best method which produce the best result in document clustering. The three methods are applied on 60 sports articles involving four different types of sports. The k-medoids clustering produced the worst result while k-means clustering is found to be more sensitive towards general words. Therefore, the method of hierarchical clustering is deemed more stable to produce a meaningful result in document clustering analysis. 


2011 ◽  
Vol 88-89 ◽  
pp. 763-766
Author(s):  
Fu Gui Fang

Fuzzy clustering analysis is an important branch of unsupervised pattern recognition. Studying the algorithm and applications of fuzzy clustering is of great significance. This paper introduces the basic knowledge of fuzzy set theory, including the definition of the fuzzy set, its theorem fuzzy relation and so on firstly. Then this paper describes how to use fuzzy clustering analysis method for data classification and the steps of fuzzy clustering analysis.


2015 ◽  
Vol 115 (4) ◽  
pp. 718-739 ◽  
Author(s):  
Jungwoo Suh ◽  
So Young Sohn

Purpose – The purpose of this paper is to provide a framework for understanding core technological competencies and identifying the trends on the technological convergence of a business ecosystem using the patent information of leading firms in the system. Design/methodology/approach – The proposed framework is composed of two steps: time-sequential text clustering analysis for comprehending changes in general technological fields and association rule analysis for identifying the trends of convergences in each field. The authors applied the proposed framework to the patents applied to United States Patent Trademark Office by Samsung Electronics, a market leader of the electronics industry, during the period from 2000 to 2011. Findings – In the sequential text clustering analysis, trends of 14 technological fields such as data storage medium and data processing, mobile, lights and heats and memory are identified. Moreover, changes of technological convergence in each field are identified using association rule analysis. For instance, in the case of technologies related to lights and heats, convergences occurred between radio transmission systems and modulated-carrier systems during the period from 2000 to 2001. However, recent convergences appeared between technologies regarding controlling lights and liquid crystal materials since 2008. Originality/value – Utilization of the framework will suggest new business opportunities to SMEs in a business ecosystem by identifying the trends of technological convergences.


2013 ◽  
Vol 443 ◽  
pp. 707-710
Author(s):  
Ya Feng Yang ◽  
Ai Min Yang ◽  
Huan Cheng Zhang

Based on the Set Pair Analysis theory and fuzzy set, the traditional clustering methods were taken into account from the characteristics of the forward, reverse and uncertain factors. A new cluster analysis method was proposed which allows more flexible clustering analysis. A case study in the materials clustering was carried on for the consumers to look for the most benefits in line with their own requirements.


2014 ◽  
Vol 556-562 ◽  
pp. 3536-3540
Author(s):  
Ya Xiong Li ◽  
Deng Pan

One key step in text mining is the categorization of texts, i.e., to put texts of the same or similar contents into one group so as to distinguish texts of different contents. However, traditional word-frequency-based statistical approaches, such as VSM model, failed to reflect the complicated meaning in texts. This paper ushers in domain ontology and constructs new conceptual vector space model in the pre-processing stage of text clustering, substituting the initial matrix (lexicon-text matrix) in the latent semantic analysis with concept-text matrix. In the clustering analysis stage, this model adopts semantic similarity, partially overcoming the difficulty in accurately and effectively evaluating the degree of similarity of text due to simply taking into account the frequency of words and/or phrases in the text. Experimental results indicate that this method is helpful in improving the result of text clustering.


Sign in / Sign up

Export Citation Format

Share Document