The construction of syntax trees using external data for partially formalized text documents

Author(s):  
Kirill Chuvilin
Author(s):  
E. D. Avedyan ◽  
I. V. Voronkov

Summary: the article proposes new software platform for automating the processes of preprocessing and marking up datasets with the aim of further solving analytical problems such as image classification and processing textual and parametric information using neural network technologies. The software platform uses modern technologies and combines a large number of methods in the form of a modular platform, which can be supplemented as the tasks of analytical data processing become more complicated. The need to develop such a software platform is dictated primarily by the fact that, given the current level of data volume growth, the actual transition to deep data analytics remains unattainable without such software platforms, since confidentiality, access to information and the use of external data processing resources are required.


2019 ◽  
Vol 8 (3) ◽  
pp. 6634-6643 ◽  

Opinion mining and sentiment analysis are valuable to extract the useful subjective information out of text documents. Predicting the customer’s opinion on amazon products has several benefits like reducing customer churn, agent monitoring, handling multiple customers, tracking overall customer satisfaction, quick escalations, and upselling opportunities. However, performing sentiment analysis is a challenging task for the researchers in order to find the users sentiments from the large datasets, because of its unstructured nature, slangs, misspells and abbreviations. To address this problem, a new proposed system is developed in this research study. Here, the proposed system comprises of four major phases; data collection, pre-processing, key word extraction, and classification. Initially, the input data were collected from the dataset: amazon customer review. After collecting the data, preprocessing was carried-out for enhancing the quality of collected data. The pre-processing phase comprises of three systems; lemmatization, review spam detection, and removal of stop-words and URLs. Then, an effective topic modelling approach Latent Dirichlet Allocation (LDA) along with modified Possibilistic Fuzzy C-Means (PFCM) was applied to extract the keywords and also helps in identifying the concerned topics. The extracted keywords were classified into three forms (positive, negative and neutral) by applying an effective machine learning classifier: Convolutional Neural Network (CNN). The experimental outcome showed that the proposed system enhanced the accuracy in sentiment analysis up to 6-20% related to the existing systems.


2007 ◽  
Vol 2 (3) ◽  
pp. 3-27 ◽  
Author(s):  
Dominik Lambrigger ◽  
Pavel Shevchenko ◽  
Mario Wüthrich

2014 ◽  
Vol 9 (4) ◽  
pp. 83-103 ◽  
Author(s):  
Giuseppe Galloppo ◽  
Daniele Previati
Keyword(s):  

Author(s):  
Laith Mohammad Abualigah ◽  
Essam Said Hanandeh ◽  
Ahamad Tajudin Khader ◽  
Mohammed Abdallh Otair ◽  
Shishir Kumar Shandilya

Background: Considering the increasing volume of text document information on Internet pages, dealing with such a tremendous amount of knowledge becomes totally complex due to its large size. Text clustering is a common optimization problem used to manage a large amount of text information into a subset of comparable and coherent clusters. Aims: This paper presents a novel local clustering technique, namely, β-hill climbing, to solve the problem of the text document clustering through modeling the β-hill climbing technique for partitioning the similar documents into the same cluster. Methods: The β parameter is the primary innovation in β-hill climbing technique. It has been introduced in order to perform a balance between local and global search. Local search methods are successfully applied to solve the problem of the text document clustering such as; k-medoid and kmean techniques. Results: Experiments were conducted on eight benchmark standard text datasets with different characteristics taken from the Laboratory of Computational Intelligence (LABIC). The results proved that the proposed β-hill climbing achieved better results in comparison with the original hill climbing technique in solving the text clustering problem. Conclusion: The performance of the text clustering is useful by adding the β operator to the hill climbing.


2020 ◽  
Vol 87 ◽  
pp. 106002 ◽  
Author(s):  
Ammar Kamal Abasi ◽  
Ahamad Tajudin Khader ◽  
Mohammed Azmi Al-Betar ◽  
Syibrah Naim ◽  
Sharif Naser Makhadmeh ◽  
...  
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document