Mining and Tracking Massive Text Data: Classification, Construction of Tracking Statistics, and Inference Under Misclassification

Technometrics ◽  
2007 ◽  
Vol 49 (2) ◽  
pp. 116-128 ◽  
Author(s):  
Daniel R Jeske ◽  
Regina Y Liu
2021 ◽  
Vol 5 (2(61)) ◽  
pp. 6-8
Author(s):  
Olena Hryshchenko ◽  
Vadym Yaremenko

The object of research is the methods of fast classification for solving text data classification problems. The need for this study is due to the rapid growth of textual data, both in digital and printed forms. Thus, there is a need to process such data using software, since human resources are not able to process such an amount of data in full. A large number of data classification approaches have been developed. The conducted research is based on the application of the following methods of classification of text data: Bloom filter, naive Bayesian classifier and neural networks to a set of text data in order to classify them into categories. Each method has both disadvantages and advantages. This paper will reflect the strengths and weaknesses of each method on a specific example. These algorithms were comparatively among themselves in terms of speed and efficiency, that is, the accuracy of determining the belonging of a text to a certain class of classification. The work of each method was considered on the same data sets with a change in the amount of training and test data, as well as with a change in the number of classification groups. The dataset used contains the following classes: world, business, sports, and science and technology. In real conditions of the classification of such data, the number of categories is much larger than that considered in the work, and may have subcategories in its composition. In the course of this study, each method was analyzed using different parameter values to obtain the best result. Analyzing the results obtained, the best results for the classification of text data were obtained using a neural network.


Text data analytics became an integral part of World Wide Web data management and Internet based applications rapidly growing all over the world. E-commerce applications are growing exponentially in the business field and the competitors in the E-commerce are gradually increasing many machine learning techniques for predicting business related operations with the aim of increasing the product sales to the greater extent. Usage of similarity measures is inevitable in modern day to day real applications. Cosine similarity plays a dominant role in text data mining applications such as text classification, clustering, querying, and searching and so on. A modified clustering based cosine similarity measure called MCS is proposed in this paper for data classification. The proposed method is experimentally verified by employing many UCI machine learning datasets involving categorical attributes. The proposed method is superior in producing more accurate classification results in majority of experiments conducted on the UCI machine learning datasets.


Medical data analysis gains more interest from the last decade due to its significance advantages. Medical data is a heterogeneous data, which is the combination of text data, numeric data and image data. For to analyze such heterogeneous data traditional data analysis mechanisms are inefficient. To handle this heterogeneous data deep learning is obvious choice. Deep learning is able to handle text, numeric and image data more efficiently than traditional data mining techniques. In this paper we proposed a deep learning based multilayer perceptron to analysis medical data. This method independently address the text data, image data and numerical data and combinable made medical data classification


2021 ◽  
Author(s):  
A. S. Surkova ◽  
S. S. Skorynin ◽  
V. V. Kondratiev ◽  
V. F. Zharinov

Buana Bastra ◽  
2021 ◽  
Vol 5 (1) ◽  
pp. 39-48
Author(s):  
Nilatul Izzah ◽  
Sunu Catur Budiyono

The research has target for describing moral value that is first figure and communitymoral there are from novel Tuhan Ijinkan Aku Menjadi Pelacur. Morality is quality ofhuman behavior that shows one's behavior is right or wrong, good or bad. In order to makethis research being analyzed, so the researcher use the theory of Thomas Lickona,whichemphasize in three things, those are to know the value of moral, moral feeling, and moralbehavior. Method for this research is hermenutik method (interpreting text). Data and dataresource are the moral value of God bless Me be Prostitute novel by Muhidin M. Dahlan.Data collection technique is using repeat reading, make notes and data classification. Dataanalysis technique is using interpretation, explanation, description, and make conclusion.Data is divided by subs chapter based on the problem and research purpose. Data thatalready been interpreted then it described in an essay as the result. The conclusion by thisresearch is 1) moral knowledge there is moral awareness, knowing moral value, takeperpective, logical of moral and self knowledge from novel “Tuhan Ijinkan Aku MenjadiPelacur” 2) the attitude of moral is moral feeling, conscience, self regard, empathy, good love, self control, and humble. 3) the action of moral is interest, desire, andfigure habits and community. All of the conclusion has been found in the first figure andcommunity from novel “Tuhan Ijinkan Aku Menjadi Pelacur” novel by Muhidin MDahlan.


2021 ◽  
Vol 8 (2) ◽  
pp. 33-45
Author(s):  
A.V. Pchelin A.V. Pchelin ◽  
◽  
N.A. Kononov N.A. Kononov ◽  
V.S. Serova V.S. Serova ◽  
E.V. Bunova E.V. Bunova ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document