Mining and Tracking Massive Text Data: Classification, Construction of Tracking Statistics, and Inference Under Misclassification

The object of research is the methods of fast classification for solving text data classification problems. The need for this study is due to the rapid growth of textual data, both in digital and printed forms. Thus, there is a need to process such data using software, since human resources are not able to process such an amount of data in full. A large number of data classification approaches have been developed. The conducted research is based on the application of the following methods of classification of text data: Bloom filter, naive Bayesian classifier and neural networks to a set of text data in order to classify them into categories. Each method has both disadvantages and advantages. This paper will reflect the strengths and weaknesses of each method on a specific example. These algorithms were comparatively among themselves in terms of speed and efficiency, that is, the accuracy of determining the belonging of a text to a certain class of classification. The work of each method was considered on the same data sets with a change in the amount of training and test data, as well as with a change in the number of classification groups. The dataset used contains the following classes: world, business, sports, and science and technology. In real conditions of the classification of such data, the number of categories is much larger than that considered in the work, and may have subcategories in its composition. In the course of this study, each method was analyzed using different parameter values to obtain the best result. Analyzing the results obtained, the best results for the classification of text data were obtained using a neural network.

Download Full-text

Modified Cosine Similarity Measure based Data Classification in Data Mining

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e9754.069520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 649-654

Keyword(s):

Machine Learning ◽

Data Mining ◽

Similarity Measure ◽

Dominant Role ◽

Similarity Measures ◽

Data Classification ◽

Cosine Similarity ◽

Machine Learning Techniques ◽

Text Data ◽

Cosine Similarity Measure

Text data analytics became an integral part of World Wide Web data management and Internet based applications rapidly growing all over the world. E-commerce applications are growing exponentially in the business field and the competitors in the E-commerce are gradually increasing many machine learning techniques for predicting business related operations with the aim of increasing the product sales to the greater extent. Usage of similarity measures is inevitable in modern day to day real applications. Cosine similarity plays a dominant role in text data mining applications such as text classification, clustering, querying, and searching and so on. A modified clustering based cosine similarity measure called MCS is proposed in this paper for data classification. The proposed method is experimentally verified by employing many UCI machine learning datasets involving categorical attributes. The proposed method is superior in producing more accurate classification results in majority of experiments conducted on the UCI machine learning datasets.

Download Full-text

Using Correlation Based Subspace Clustering for Multi-label Text Data Classification

2010 22nd IEEE International Conference on Tools with Artificial Intelligence ◽

10.1109/ictai.2010.115 ◽

2010 ◽

Cited By ~ 3

Author(s):

Mohammad Salim Ahmed ◽

Latifur Khan ◽

Mandava Rajeswari

Keyword(s):

Subspace Clustering ◽

Data Classification ◽

Text Data

Download Full-text

Hybrid Technique for Medical Data Classification using Multi-Layer Perceptron with NB Classifier

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k2179.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 2627-2632

Keyword(s):

Deep Learning ◽

Data Analysis ◽

Image Data ◽

Numerical Data ◽

Data Classification ◽

Heterogeneous Data ◽

Medical Data ◽

Hybrid Technique ◽

Text Data ◽

Medical Data Classification

Medical data analysis gains more interest from the last decade due to its significance advantages. Medical data is a heterogeneous data, which is the combination of text data, numeric data and image data. For to analyze such heterogeneous data traditional data analysis mechanisms are inefficient. To handle this heterogeneous data deep learning is obvious choice. Deep learning is able to handle text, numeric and image data more efficiently than traditional data mining techniques. In this paper we proposed a deep learning based multilayer perceptron to analysis medical data. This method independently address the text data, image data and numerical data and combinable made medical data classification

Download Full-text

An Innovative Research Framework on Intelligent Text Data Classification System Using Genetic Algorithm

International Journal of Artificial Intelligence & Applications ◽

10.5121/ijaia.2016.7605 ◽

2016 ◽

Vol 7 (6) ◽

pp. 57-73

Author(s):

Maheswara Rao V V R ◽

Silpa N ◽

Gadiraju Mahesh

Keyword(s):

Genetic Algorithm ◽

Classification System ◽

Data Classification ◽

Text Data ◽

Research Framework ◽

Innovative Research

Download Full-text

Union model performance improving for text data classification

10.1063/5.0071389 ◽

2021 ◽

Author(s):

A. S. Surkova ◽

S. S. Skorynin ◽

V. V. Kondratiev ◽

V. F. Zharinov

Keyword(s):

Model Performance ◽

Data Classification ◽

Text Data ◽

Union Model

Download Full-text

KAJIAN MORALITAS DALAM NOVEL TUHAN IJINKAN AKU MENJADI PELACUR KARYA MUHIDIN M DAHLAN

Buana Bastra ◽

10.36456/bastra.vol5.no1.a3580 ◽

2021 ◽

Vol 5 (1) ◽

pp. 39-48

Author(s):

Nilatul Izzah ◽

Sunu Catur Budiyono

Keyword(s):

Data Classification ◽

Moral Knowledge ◽

Self Control ◽

Research Purpose ◽

Moral Awareness ◽

Moral Value ◽

Text Data ◽

Self Knowledge ◽

Moral Feeling

The research has target for describing moral value that is first figure and communitymoral there are from novel Tuhan Ijinkan Aku Menjadi Pelacur. Morality is quality ofhuman behavior that shows one's behavior is right or wrong, good or bad. In order to makethis research being analyzed, so the researcher use the theory of Thomas Lickona,whichemphasize in three things, those are to know the value of moral, moral feeling, and moralbehavior. Method for this research is hermenutik method (interpreting text). Data and dataresource are the moral value of God bless Me be Prostitute novel by Muhidin M. Dahlan.Data collection technique is using repeat reading, make notes and data classification. Dataanalysis technique is using interpretation, explanation, description, and make conclusion.Data is divided by subs chapter based on the problem and research purpose. Data thatalready been interpreted then it described in an essay as the result. The conclusion by thisresearch is 1) moral knowledge there is moral awareness, knowing moral value, takeperpective, logical of moral and self knowledge from novel “Tuhan Ijinkan Aku MenjadiPelacur” 2) the attitude of moral is moral feeling, conscience, self regard, empathy, good love, self control, and humble. 3) the action of moral is interest, desire, andfigure habits and community. All of the conclusion has been found in the first figure andcommunity from novel “Tuhan Ijinkan Aku Menjadi Pelacur” novel by Muhidin MDahlan.

Download Full-text