Using clustering to aid text classification of single-labelled datasets

Mapping Intimacies ◽

10.12681/eadd/30839 ◽

2009 ◽

Author(s):

Αντωνία Κυριακοπούλου

Keyword(s):

Unsupervised Learning ◽

Text Classification ◽

Data Representation ◽

Classification Performance ◽

Critical Research ◽

Social Bookmarking ◽

Supervised And Unsupervised Learning ◽

Text Classifiers ◽

Concise Representation

Supervised and unsupervised learning have been the focus of critical research in the areas of machine learning and artificial intelligence. In the literature, these two streams flow independently of each other, despite their close conceptual and practical connections. This dissertation demonstrates that unsupervised learning algorithms, i.e. clustering, can provide us with valuable information about the data and help in the creation of high-accuracy text classifiers. In the case of clustering,the aim is to extract a kind of \structure" from a given sample of objects. The reasoning behind this is that if some structure exists in the objects, it is possible to take advantage of this information and find a short description of the data,exploiting the dependence or association between index terms and documents.This concise representation of the whole dataset can be properly incorporated in the existing data representation. The use of prior knowledge about the nature oft he dataset helps in building a more efficient classifier for this set. This approach does not capture all the intricacies of text; however on some domains this technique substantially improves text classification accuracy.In this vein, a study of the interaction between supervised and unsupervised learning has been carried out. We have studied and implemented models that apply clustering in multiple ways and in conjunction with classification to construct robust text classifiers. The extensive experimentation has shown the effectiveness of using clustering to boost text classification performance. Additionally, preliminary experiments on some of the most important applications of text classification such as Spam Mail Filtering, Spam Detection in Social Bookmarking Systems,and Sentence Boundary Disambiguation, have shown promising enhancements by exploiting the proposed models.

Download Full-text

Text Classification of Public Feedbacks using Convolutional Neural Network Based on Differential Evolution Algorithm

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2019.1.3420 ◽

2019 ◽

Vol 14 (1) ◽

pp. 124-134 ◽

Cited By ~ 2

Author(s):

Shuai Zhang ◽

Yong Chen ◽

Xiaoling Huang ◽

Yishuai Cai

Keyword(s):

Neural Network ◽

Differential Evolution ◽

Convolutional Neural Network ◽

Text Classification ◽

Differential Evolution Algorithm ◽

Classification Performance ◽

Classification Model ◽

Evolution Algorithm ◽

Classification Prediction

Online feedback is an effective way of communication between government departments and citizens. However, the daily high number of public feedbacks has increased the burden on government administrators. The deep learning method is good at automatically analyzing and extracting deep features of data, and then improving the accuracy of classification prediction. In this study, we aim to use the text classification model to achieve the automatic classification of public feedbacks to reduce the work pressure of administrator. In particular, a convolutional neural network model combined with word embedding and optimized by differential evolution algorithm is adopted. At the same time, we compared it with seven common text classification models, and the results show that the model we explored has good classification performance under different evaluation metrics, including accuracy, precision, recall, and F1-score.

Download Full-text

Fusion of supervised and unsupervised learning for improved classification of hyperspectral images

Information Sciences ◽

10.1016/j.ins.2012.06.031 ◽

2012 ◽

Vol 217 ◽

pp. 39-55 ◽

Cited By ~ 48

Author(s):

Naif Alajlan ◽

Yakoub Bazi ◽

Farid Melgani ◽

Ronald R. Yager

Keyword(s):

Unsupervised Learning ◽

Hyperspectral Images ◽

Supervised And Unsupervised Learning

Download Full-text

GILE: A Generalized Input-Label Embedding for Text Classification

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00259 ◽

2019 ◽

Vol 7 ◽

pp. 139-155 ◽

Cited By ~ 1

Author(s):

Nikolaos Pappas ◽

James Henderson

Keyword(s):

Text Classification ◽

Joint Space ◽

Classification Performance ◽

Cross Entropy ◽

Categorical Variables ◽

Classification Models ◽

Set Size ◽

Nonlinear Input ◽

Label Semantics

Neural text classification models typically treat output labels as categorical variables that lack description and semantics. This forces their parametrization to be dependent on the label set size, and, hence, they are unable to scale to large label sets and generalize to unseen ones. Existing joint input-label text models overcome these issues by exploiting label descriptions, but they are unable to capture complex label relationships, have rigid parametrization, and their gains on unseen labels happen often at the expense of weak performance on the labels seen during training. In this paper, we propose a new input-label model that generalizes over previous such models, addresses their limitations, and does not compromise performance on seen labels. The model consists of a joint nonlinear input-label embedding with controllable capacity and a joint-space-dependent classification unit that is trained with cross-entropy loss to optimize classification performance. We evaluate models on full-resource and low- or zero-resource text classification of multilingual news and biomedical text with a large label set. Our model outperforms monolingual and multilingual models that do not leverage label semantics and previous joint input-label space models in both scenarios.

Download Full-text

Approach for Text Classification Based on the Similarity Measurement between Normal Cloud Models

The Scientific World JOURNAL ◽

10.1155/2014/784392 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Jin Dai ◽

Xin Liu

Keyword(s):

Text Classification ◽

Research Area ◽

Classification Performance ◽

Similarity Measurement ◽

The Core ◽

Cloud Models ◽

Text Classifiers ◽

Text Information ◽

Text Features ◽

Better Than

The similarity between objects is the core research area of data mining. In order to reduce the interference of the uncertainty of nature language, a similarity measurement between normal cloud models is adopted to text classification research. On this basis, a novel text classifier based on cloud concept jumping up (CCJU-TC) is proposed. It can efficiently accomplish conversion between qualitative concept and quantitative data. Through the conversion from text set to text information table based on VSM model, the text qualitative concept, which is extraction from the same category, is jumping up as a whole category concept. According to the cloud similarity between the test text and each category concept, the test text is assigned to the most similar category. By the comparison among different text classifiers in different feature selection set, it fully proves that not only does CCJU-TC have a strong ability to adapt to the different text features, but also the classification performance is also better than the traditional classifiers.

Download Full-text

Duality Between Learning Machines: A Bridge Between Supervised and Unsupervised Learning

Neural Computation ◽

10.1162/neco.1994.6.3.491 ◽

1994 ◽

Vol 6 (3) ◽

pp. 491-508 ◽

Cited By ~ 7

Author(s):

J.-P. Nadal ◽

N. Parga

Keyword(s):

Theoretical Analysis ◽

Unsupervised Learning ◽

Supervised Learning ◽

Learning Task ◽

Maximum Information ◽

Learning Machines ◽

Learning Tasks ◽

Supervised And Unsupervised Learning ◽

Second Dual

We exhibit a duality between two perceptrons that allows us to compare the theoretical analysis of supervised and unsupervised learning tasks. The first perceptron has one output and is asked to learn a classification of p patterns. The second (dual) perceptron has p outputs and is asked to transmit as much information as possible on a distribution of inputs. We show in particular that the maximum information that can be stored in the couplings for the supervised learning task is equal to the maximum information that can be transmitted by the dual perceptron.

Download Full-text

Robust classification of hyperspectral images based on the combination of supervised and unsupervised learning paradigms

2012 IEEE International Geoscience and Remote Sensing Symposium ◽

10.1109/igarss.2012.6351270 ◽

2012 ◽

Author(s):

Naif Alajlan ◽

Yakoub Bazi ◽

Haikel AlHichri ◽

Essam Othman

Keyword(s):

Unsupervised Learning ◽

Hyperspectral Images ◽

Robust Classification ◽

Supervised And Unsupervised Learning

Download Full-text

Feature Extraction and Classification of Colon Cancer Using a Hybrid Approach of Supervised and Unsupervised Learning

Intelligent Systems Reference Library - Advanced Machine Learning Approaches in Cancer Prognosis ◽

10.1007/978-3-030-71975-3_7 ◽

2021 ◽

pp. 195-219

Author(s):

Joydev Ghosh ◽

Amitesh Kumar Sharma ◽

Sahil Tomar

Keyword(s):

Colon Cancer ◽

Feature Extraction ◽

Unsupervised Learning ◽

Hybrid Approach ◽

Supervised And Unsupervised Learning

Download Full-text

GCIceNet: a graph convolutional network for accurate classification of water phases

Physical Chemistry Chemical Physics ◽

10.1039/d0cp03456h ◽

2020 ◽

Vol 22 (45) ◽

pp. 26340-26350

Author(s):

QHwan Kim ◽

Joon-Hyuk Ko ◽

Sunghoon Kim ◽

Wonho Jhe

Keyword(s):

Unsupervised Learning ◽

Order Parameters ◽

Water Molecules ◽

Convolutional Network ◽

Convolutional Networks ◽

Supervised And Unsupervised Learning

We develop GCIceNet, which automatically generates machine-based order parameters for classifying the phases of water molecules via supervised and unsupervised learning with graph convolutional networks.

Download Full-text