An Incremental Document Clustering Algorithm Based on a Hierarchical Agglomerative Approach

Biomedical Document Clustering Based on Accelerated Symbiotic Organisms Search Algorithm

International Journal of Swarm Intelligence Research ◽

10.4018/ijsir.2021100109 ◽

2021 ◽

Vol 12 (4) ◽

pp. 169-185

Author(s):

Saida Ishak Boushaki ◽

Omar Bendjeghaba ◽

Nadjet Kamel

Keyword(s):

Clustering Algorithm ◽

Search Algorithm ◽

Clustering Algorithms ◽

Document Clustering ◽

Latent Semantic Indexing ◽

Research Area ◽

Semantic Indexing ◽

Local Optima ◽

Symbiotic Organisms Search ◽

Symbiotic Organisms

Clustering is an important unsupervised analysis technique for big data mining. It finds its application in several domains including biomedical documents of the MEDLINE database. Document clustering algorithms based on metaheuristics is an active research area. However, these algorithms suffer from the problems of getting trapped in local optima, need many parameters to adjust, and the documents should be indexed by a high dimensionality matrix using the traditional vector space model. In order to overcome these limitations, in this paper a new documents clustering algorithm (ASOS-LSI) with no parameters is proposed. It is based on the recent symbiotic organisms search metaheuristic (SOS) and enhanced by an acceleration technique. Furthermore, the documents are represented by semantic indexing based on the famous latent semantic indexing (LSI). Conducted experiments on well-known biomedical documents datasets show the significant superiority of ASOS-LSI over five famous algorithms in terms of compactness, f-measure, purity, misclassified documents, entropy, and runtime.

Download Full-text

A document clustering algorithm based on improved landmark semidefinite embedding

The 2nd International Conference on Information Science and Engineering ◽

10.1109/icise.2010.5690075 ◽

2010 ◽

Author(s):

Hui Wang ◽

Hua Qin ◽

Li-duo Ding ◽

Gang-gang Hui

Keyword(s):

Clustering Algorithm ◽

Document Clustering

Download Full-text

An efficient hybrid distributed document clustering algorithm

Scientific Research and Essays ◽

10.5897/sre2014.6107 ◽

2015 ◽

Vol 10 (1) ◽

pp. 14-22 ◽

Cited By ~ 1

Author(s):

E Judith J ◽

Jayakumari J

Keyword(s):

Clustering Algorithm ◽

Document Clustering

Download Full-text

Document Clustering Algorithm Based on Tree-Structured Growing Self-Organizing Feature Map

Advances in Neural Networks – ISNN 2004 - Lecture Notes in Computer Science ◽

10.1007/978-3-540-28647-9_138 ◽

2004 ◽

pp. 840-845 ◽

Cited By ~ 2

Author(s):

Xiaoshen Zheng ◽

Wenling Liu ◽

Pilian He ◽

Weidi Dai

Keyword(s):

Clustering Algorithm ◽

Document Clustering ◽

Feature Map ◽

Self Organizing

Download Full-text

An Ontology Based Model for Document Clustering

International Journal of Intelligent Information Technologies ◽

10.4018/jiit.2011070105 ◽

2011 ◽

Vol 7 (3) ◽

pp. 54-69 ◽

Cited By ~ 13

Author(s):

U. K. Sridevi ◽

N. Nagaveni

Keyword(s):

Vector Space ◽

Domain Knowledge ◽

Clustering Algorithm ◽

Document Clustering ◽

Vector Space Model ◽

Search Space ◽

Space Model ◽

Clustering Model ◽

Document Collection ◽

Improved Performance

Clustering is an important topic to find relevant content from a document collection and it also reduces the search space. The current clustering research emphasizes the development of a more efficient clustering method without considering the domain knowledge and user’s need. In recent years the semantics of documents have been utilized in document clustering. The discussed work focuses on the clustering model where ontology approach is applied. The major challenge is to use the background knowledge in the similarity measure. This paper presents an ontology based annotation of documents and clustering system. The semi-automatic document annotation and concept weighting scheme is used to create an ontology based knowledge base. The Particle Swarm Optimization (PSO) clustering algorithm can be applied to obtain the clustering solution. The accuracy of clustering has been computed before and after combining ontology with Vector Space Model (VSM). The proposed ontology based framework gives improved performance and better clustering compared to the traditional vector space model. The result using ontology was significant and promising.

Download Full-text

A Document Clustering Algorithm Based on Semi-constrained Hierarchical Latent Dirichlet Allocation

Knowledge Science, Engineering and Management - Lecture Notes in Computer Science ◽

10.1007/978-3-319-12096-6_5 ◽

2014 ◽

pp. 49-60 ◽

Cited By ~ 2

Author(s):

Jungang Xu ◽

Shilong Zhou ◽

Lin Qiu ◽

Shengyuan Liu ◽

Pengfei Li

Keyword(s):

Clustering Algorithm ◽

Latent Dirichlet Allocation ◽

Document Clustering ◽

Dirichlet Allocation

Download Full-text

Fine-Tuning an Algorithm for Semantic Document Clustering Using a Similarity Graph

International Journal of Semantic Computing ◽

10.1142/s1793351x16400195 ◽

2016 ◽

Vol 10 (04) ◽

pp. 527-555

Author(s):

Lubomir Stanchev

Keyword(s):

English Language ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Document Clustering ◽

Fine Tuning ◽

Human Judgment ◽

Multiple Parameters ◽

Similarity Graph ◽

Multiple Metrics

In this article, we examine an algorithm for document clustering using a similarity graph. The graph stores words and common phrases from the English language as nodes and it can be used to compute the degree of semantic similarity between any two phrases. One application of the similarity graph is semantic document clustering, that is, grouping documents based on the meaning of the words in them. Since our algorithm for semantic document clustering relies on multiple parameters, we examine how fine-tuning these values affects the quality of the result. Specifically, we use the Reuters-21578 benchmark, which contains [Formula: see text] newswire stories that are grouped in 82 categories using human judgment. We apply the k-means clustering algorithm to group the documents using a similarity metric that is based on keywords matching and one that uses the similarity graph. We evaluate the results of the clustering algorithms using multiple metrics, such as precision, recall, f-score, entropy, and purity.

Download Full-text

Document Clustering and Automatic Labeling for Forensic Analysis Using High Performance Clustering Algorithm

International Journal Of Engineering And Computer Science ◽

10.18535/ijecs/v4i9.29 ◽

2015 ◽

Author(s):

Asmita V. Mane ◽

◽

Prof. Gitanjali Shinde ◽

Keyword(s):

High Performance ◽

Clustering Algorithm ◽

Document Clustering ◽

Forensic Analysis

Download Full-text

Survey on Clustering Algorithm for Document Clustering

International Journal of Computer Trends and Technology ◽

10.14445/22312803/ijctt-v11p127 ◽

2014 ◽

Vol 11 (3) ◽

pp. 128-130

Author(s):

Priyanka khadse ◽

◽

Harshal Chowhan

Keyword(s):

Clustering Algorithm ◽

Document Clustering

Download Full-text

A Parallel Hybrid Web Document Clustering Algorithm and its Performance Study

The Journal of Supercomputing ◽

10.1023/b:supe.0000040611.25862.d9 ◽

2004 ◽

Vol 30 (2) ◽

pp. 117-131 ◽

Cited By ~ 19

Author(s):

Shuting Xu ◽

Jun Zhang

Keyword(s):

Clustering Algorithm ◽

Document Clustering ◽

Performance Study ◽

Web Document ◽

Parallel Hybrid ◽

Web Document Clustering

Download Full-text