Notice of Retraction: Study on Method of Word Segmentation in Feature Selection in Chinese Text Categorization

To solving Chinese text categorization, a fast algorithm is proposed. The basic idea of the algorithm is: first constructs a weighted value of keywords dictionary which is constructed in key tree, then using the Hash function and the principle of giving priority for long term matching to mapping the strings in documentations to the dictionary. After that, calculate the sum of weights of the keywords which has been matched successfully. Finally take the maximum for the result of the classification. The algorithm can avoid the difficulty of Chinese word segmentation and its influence on accuracy of result. Theoretical analysis and experimental results indicate that the accuracy and the time efficiency of the algorithm is higher, whose comprehensive performance reaches to the level of current major technology.

Download Full-text

Research and implementation of Chinese text feature selection algorithm based on χ2statistics

Computational Intelligence and Industrial Engineering ◽

10.2495/ciie140191 ◽

2014 ◽

Author(s):

Weijiang Wu ◽

Shengkai Wen ◽

Dongmei Xia ◽

Guohe Li

Keyword(s):

Feature Selection ◽

Chinese Text ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Text Feature

Download Full-text

A lazy feature selection method for multi-label classification

Intelligent Data Analysis ◽

10.3233/ida-194878 ◽

2021 ◽

Vol 25 (1) ◽

pp. 21-34

Author(s):

Rafael B. Pereira ◽

Alexandre Plastino ◽

Bianca Zadrozny ◽

Luiz H.C. Merschmann

Keyword(s):

Feature Selection ◽

Text Categorization ◽

Feature Selection Method ◽

Selection Method ◽

Video Classification ◽

Classification Problems ◽

Class Label ◽

New Feature ◽

Feature Selection Techniques ◽

Biomolecular Analysis

In many important application domains, such as text categorization, biomolecular analysis, scene or video classification and medical diagnosis, instances are naturally associated with more than one class label, giving rise to multi-label classification problems. This has led, in recent years, to a substantial amount of research in multi-label classification. More specifically, feature selection methods have been developed to allow the identification of relevant and informative features for multi-label classification. This work presents a new feature selection method based on the lazy feature selection paradigm and specific for the multi-label context. Experimental results show that the proposed technique is competitive when compared to multi-label feature selection techniques currently used in the literature, and is clearly more scalable, in a scenario where there is an increasing amount of data.

Download Full-text

Applying the Bell’s Test to Chinese Texts

Entropy ◽

10.3390/e22030275 ◽

2020 ◽

Vol 22 (3) ◽

pp. 275

Author(s):

Igor A. Bessmertny ◽

Xiaoxi Huang ◽

Aleksei V. Platonov ◽

Chuqiao Yu ◽

Julia A. Koroleva

Keyword(s):

Quantum Entanglement ◽

Chinese Text ◽

Search Engines ◽

Text Processing ◽

Word Segmentation ◽

Significant Problem ◽

Text Segmentation ◽

Text Documents ◽

Segmentation Algorithms ◽

Chinese Texts

Search engines are able to find documents containing patterns from a query. This approach can be used for alphabetic languages such as English. However, Chinese is highly dependent on context. The significant problem of Chinese text processing is the missing blanks between words, so it is necessary to segment the text to words before any other action. Algorithms for Chinese text segmentation should consider context; that is, the word segmentation process depends on other ideograms. As the existing segmentation algorithms are imperfect, we have considered an approach to build the context from all possible n-grams surrounding the query words. This paper proposes a quantum-inspired approach to rank Chinese text documents by their relevancy to the query. Particularly, this approach uses Bell’s test, which measures the quantum entanglement of two words within the context. The contexts of words are built using the hyperspace analogue to language (HAL) algorithm. Experiments fulfilled in three domains demonstrated that the proposed approach provides acceptable results.

Download Full-text