scholarly journals Using clustering to aid text classification of single-labelled datasets

2009 ◽  
Author(s):  
Αντωνία Κυριακοπούλου

Supervised and unsupervised learning have been the focus of critical research in the areas of machine learning and artificial intelligence. In the literature, these two streams flow independently of each other, despite their close conceptual and practical connections. This dissertation demonstrates that unsupervised learning algorithms, i.e. clustering, can provide us with valuable information about the data and help in the creation of high-accuracy text classifiers. In the case of clustering,the aim is to extract a kind of \structure" from a given sample of objects. The reasoning behind this is that if some structure exists in the objects, it is possible to take advantage of this information and find a short description of the data,exploiting the dependence or association between index terms and documents.This concise representation of the whole dataset can be properly incorporated in the existing data representation. The use of prior knowledge about the nature oft he dataset helps in building a more efficient classifier for this set. This approach does not capture all the intricacies of text; however on some domains this technique substantially improves text classification accuracy.In this vein, a study of the interaction between supervised and unsupervised learning has been carried out. We have studied and implemented models that apply clustering in multiple ways and in conjunction with classification to construct robust text classifiers. The extensive experimentation has shown the effectiveness of using clustering to boost text classification performance. Additionally, preliminary experiments on some of the most important applications of text classification such as Spam Mail Filtering, Spam Detection in Social Bookmarking Systems,and Sentence Boundary Disambiguation, have shown promising enhancements by exploiting the proposed models.

2019 ◽  
Vol 14 (1) ◽  
pp. 124-134 ◽  
Author(s):  
Shuai Zhang ◽  
Yong Chen ◽  
Xiaoling Huang ◽  
Yishuai Cai

Online feedback is an effective way of communication between government departments and citizens. However, the daily high number of public feedbacks has increased the burden on government administrators. The deep learning method is good at automatically analyzing and extracting deep features of data, and then improving the accuracy of classification prediction. In this study, we aim to use the text classification model to achieve the automatic classification of public feedbacks to reduce the work pressure of administrator. In particular, a convolutional neural network model combined with word embedding and optimized by differential evolution algorithm is adopted. At the same time, we compared it with seven common text classification models, and the results show that the model we explored has good classification performance under different evaluation metrics, including accuracy, precision, recall, and F1-score.


2019 ◽  
Vol 7 ◽  
pp. 139-155 ◽  
Author(s):  
Nikolaos Pappas ◽  
James Henderson

Neural text classification models typically treat output labels as categorical variables that lack description and semantics. This forces their parametrization to be dependent on the label set size, and, hence, they are unable to scale to large label sets and generalize to unseen ones. Existing joint input-label text models overcome these issues by exploiting label descriptions, but they are unable to capture complex label relationships, have rigid parametrization, and their gains on unseen labels happen often at the expense of weak performance on the labels seen during training. In this paper, we propose a new input-label model that generalizes over previous such models, addresses their limitations, and does not compromise performance on seen labels. The model consists of a joint nonlinear input-label embedding with controllable capacity and a joint-space-dependent classification unit that is trained with cross-entropy loss to optimize classification performance. We evaluate models on full-resource and low- or zero-resource text classification of multilingual news and biomedical text with a large label set. Our model outperforms monolingual and multilingual models that do not leverage label semantics and previous joint input-label space models in both scenarios.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Jin Dai ◽  
Xin Liu

The similarity between objects is the core research area of data mining. In order to reduce the interference of the uncertainty of nature language, a similarity measurement between normal cloud models is adopted to text classification research. On this basis, a novel text classifier based on cloud concept jumping up (CCJU-TC) is proposed. It can efficiently accomplish conversion between qualitative concept and quantitative data. Through the conversion from text set to text information table based on VSM model, the text qualitative concept, which is extraction from the same category, is jumping up as a whole category concept. According to the cloud similarity between the test text and each category concept, the test text is assigned to the most similar category. By the comparison among different text classifiers in different feature selection set, it fully proves that not only does CCJU-TC have a strong ability to adapt to the different text features, but also the classification performance is also better than the traditional classifiers.


1994 ◽  
Vol 6 (3) ◽  
pp. 491-508 ◽  
Author(s):  
J.-P. Nadal ◽  
N. Parga

We exhibit a duality between two perceptrons that allows us to compare the theoretical analysis of supervised and unsupervised learning tasks. The first perceptron has one output and is asked to learn a classification of p patterns. The second (dual) perceptron has p outputs and is asked to transmit as much information as possible on a distribution of inputs. We show in particular that the maximum information that can be stored in the couplings for the supervised learning task is equal to the maximum information that can be transmitted by the dual perceptron.


2020 ◽  
Vol 22 (45) ◽  
pp. 26340-26350
Author(s):  
QHwan Kim ◽  
Joon-Hyuk Ko ◽  
Sunghoon Kim ◽  
Wonho Jhe

We develop GCIceNet, which automatically generates machine-based order parameters for classifying the phases of water molecules via supervised and unsupervised learning with graph convolutional networks.


Sign in / Sign up

Export Citation Format

Share Document