Research on the text clustering algorithm based on latent semantic analysis and optimization

Online discussion forums have rapidly gained usage in e-learning systems. This has placed a heavy burden on course instructors in terms of moderating student discussions. Previous methods of assessing student participation in online discussions followed strictly quantitative approaches that did not necessarily capture the students’ effort. Along with this growth in usage there is a need for accelerated knowledge extraction tools for analysing and presenting online messages in a useful and meaningful manner. This article discussed a qualitative approach which involves content analysis of the discussions and generation of clustered keywords which can be used to identify topics of discussion. The authors applied a new k-means++ clustering algorithm with latent semantic analysis to assess the topics expressed by students in online discussion forums. The proposed algorithm was then compared with the standard k-means++ algorithm. Using the Moodle course management forum to validate the proposed algorithm, the authors show that the k-mean++ clustering algorithm with latent semantic analysis performs better than a stand-alone k-means++.

Download Full-text

Improved text clustering algorithm of probabilistic latent with semantic analysis

Journal of Computer Applications ◽

10.3724/sp.j.1087.2011.00674 ◽

2011 ◽

Vol 31 (3) ◽

pp. 674-676

Author(s):

Yu-fang ZHANG ◽

Jun ZHU ◽

Zhong-yang XIONG

Keyword(s):

Clustering Algorithm ◽

Semantic Analysis ◽

Text Clustering

Download Full-text

Enhancing GSOM text clustering with Latent Semantic Analysis

2010 Fifth International Conference on Information and Automation for Sustainability ◽

10.1109/iciafs.2010.5715702 ◽

2010 ◽

Cited By ~ 2

Author(s):

S Matharage ◽

D Alahakoon

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Text Clustering

Download Full-text

Tag clustering algorithm LMMSK: improved K-means algorithm based on latent semantic analysis

Journal of Systems Engineering and Electronics ◽

10.21629/jsee.2017.02.18 ◽

2017 ◽

Vol 28 (2) ◽

pp. 374 ◽

Cited By ~ 5

Keyword(s):

Latent Semantic Analysis ◽

Clustering Algorithm ◽

Semantic Analysis ◽

Tag Clustering

Download Full-text

Bert-Based Latent Semantic Analysis (Bert-LSA): A Case Study on Geospatial Data Technology and Application Trend Analysis

Applied Sciences ◽

10.3390/app112411897 ◽

2021 ◽

Vol 11 (24) ◽

pp. 11897

Author(s):

Quanying Cheng ◽

Yunqiang Zhu ◽

Jia Song ◽

Hongyun Zeng ◽

Shu Wang ◽

...

Keyword(s):

Trend Analysis ◽

Latent Semantic Analysis ◽

Clustering Algorithm ◽

Semantic Analysis ◽

Relevant Literature ◽

Geospatial Data ◽

Probabilistic Latent Semantic Analysis ◽

Data Resource ◽

Topic Analysis ◽

Text Content

Geospatial data is an indispensable data resource for research and applications in many fields. The technologies and applications related to geospatial data are constantly advancing and updating, so identifying the technologies and applications among them will help foster and fund further innovation. Through topic analysis, new research hotspots can be discovered by understanding the whole development process of a topic. At present, the main methods to determine topics are peer review and bibliometrics, however they just review relevant literature or perform simple frequency analysis. This paper proposes a new topic discovery method, which combines a word embedding method, based on a pre-trained model, Bert, and a spherical k-means clustering algorithm, and applies the similarity between literature and topics to assign literature to different topics. The proposed method was applied to 266 pieces of literature related to geospatial data over the past five years. First, according to the number of publications, the trend analysis of technologies and applications related to geospatial data in several leading countries was conducted. Then, the consistency of the proposed method and the existing method PLSA (Probabilistic Latent Semantic Analysis) was evaluated by using two similar consistency evaluation indicators (i.e., U-Mass and NMPI). The results show that the method proposed in this paper can well reveal text content, determine development trends, and produce more coherent topics, and that the overall performance of Bert-LSA is better than PLSA using NPMI and U-Mass. This method is not limited to trend analysis using the data in this paper; it can also be used for the topic analysis of other types of texts.

Download Full-text

Design and evaluation of a parallel document clustering algorithm based on hierarchical latent semantic analysis

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.5094 ◽

2018 ◽

Vol 31 (13) ◽

pp. e5094 ◽

Cited By ~ 2

Author(s):

Karthick Seshadri ◽

K. Viswanathan Iyer ◽

Mercy Shalinie S

Keyword(s):

Latent Semantic Analysis ◽

Clustering Algorithm ◽

Semantic Analysis ◽

Document Clustering

Download Full-text

Text Clustering Based on Domain Ontology and Latent Semantic Analysis

2010 International Conference on Asian Language Processing ◽

10.1109/ialp.2010.55 ◽

2010 ◽

Cited By ~ 3

Author(s):

Yaxiong Li ◽

Jianqiang Zhang ◽

Dan Hu

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Text Clustering ◽

Domain Ontology

Download Full-text

Text Clustering Based on Domain Ontology and Latent Semantic Analysis

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.556-562.3536 ◽

2014 ◽

Vol 556-562 ◽

pp. 3536-3540

Author(s):

Ya Xiong Li ◽

Deng Pan

Keyword(s):

Latent Semantic Analysis ◽

Clustering Analysis ◽

Semantic Analysis ◽

Vector Space Model ◽

Text Clustering ◽

Domain Ontology ◽

Initial Matrix ◽

Processing Stage ◽

Space Model ◽

Degree Of Similarity

One key step in text mining is the categorization of texts, i.e., to put texts of the same or similar contents into one group so as to distinguish texts of different contents. However, traditional word-frequency-based statistical approaches, such as VSM model, failed to reflect the complicated meaning in texts. This paper ushers in domain ontology and constructs new conceptual vector space model in the pre-processing stage of text clustering, substituting the initial matrix (lexicon-text matrix) in the latent semantic analysis with concept-text matrix. In the clustering analysis stage, this model adopts semantic similarity, partially overcoming the difficulty in accurately and effectively evaluating the degree of similarity of text due to simply taking into account the frequency of words and/or phrases in the text. Experimental results indicate that this method is helpful in improving the result of text clustering.

Download Full-text