scholarly journals Topic Word Embedding-Based Methods for Automatically Extracting Main Aspects from Product Reviews

2020 ◽  
Vol 10 (11) ◽  
pp. 3831 ◽  
Author(s):  
Sang-Min Park ◽  
Sung Joon Lee ◽  
Byung-Won On

Detecting the main aspects of a particular product from a collection of review documents is so challenging in real applications. To address this problem, we focus on utilizing existing topic models that can briefly summarize large text documents. Unlike existing approaches that are limited because of modifying any topic model or using seed opinion words as prior knowledge, we propose a novel approach of (1) identifying starting points for learning, (2) cleaning dirty topic results through word embedding and unsupervised clustering, and (3) automatically generating right aspects using topic and head word embedding. Experimental results show that the proposed methods create more clean topics, improving about 25% of Rouge–1, compared to the baseline method. In addition, through the proposed three methods, the main aspects suitable for given data are detected automatically.

Author(s):  
František Dařena ◽  
Jan Žižka

The chapter introduces clustering as a family of algorithms that can be successfully used to organize text documents into groups without prior knowledge of these groups. The chapter also demonstrates using unsupervised clustering to group large amount of unlabeled textual data (customer reviews written informally in five natural languages) so it can be used later for further analysis. The attention is paid to the process of selecting clustering algorithms, their parameters, methods of data preprocessing, and to the methods of evaluating the results by a human expert with an assistance of computers, too. The feasibility has been demonstrated by a number of experiments with external evaluation using known labels and expert validation with an assistance of a computer. It has been found that it is possible to apply the same procedures, including clustering, cluster validation, and detection of topics and significant words for different natural languages with satisfactory results.


2021 ◽  
Vol 33 (4) ◽  
pp. 147-162
Author(s):  
Aleksey Yur'evich Yakushev ◽  
Yury Vital'evich Markin ◽  
Stanislav Alexandrovich Fomin ◽  
Dmitry Olegovich Obydenkov ◽  
Boris Vladimirovich Kondrat’ev

One of the most common ways documents leak is taking a picture of document displayed on the screen. For investigation of such cases data leakage prevention technologies including screen watermarking are used. The article gives short review on the problem of screen shooting watermarking and the existing research results. A novel approach for watermarking text images displayed on the screen is proposed. The watermark is embedded as slight changes in luminance into the interline spacing of marked text. The watermark is designed to be invisible for human eye but still able to be detected by digital camera. An algorithm for extraction of watermark from the screen photo is presented. The extraction algorithm doesn’t need the original image of document for successful extraction. The experimental results show that the approach is robust against screen-cam attacks, that means that the watermark stays persistent after the process of taking a photo of document displayed on the screen. A criterion for watermark message extraction accuracy without knowledge about the original message is proposed. The criterion represents the probability that the watermark was extracted correctly.


2017 ◽  
Vol 26 (2) ◽  
pp. 233-241
Author(s):  
Eman Ismail ◽  
Walaa Gad

AbstractIn this paper, we propose a novel approach called Classification Based on Enrichment Representation (CBER) of short text documents. The proposed approach extracts concepts occurring in short text documents and uses them to calculate the weight of the synonyms of each concept. Concepts with the same meanings will increase the weights of their synonyms. However, the text document is short and concepts are rarely repeated; therefore, we capture the semantic relationships among concepts and solve the disambiguation problem. The experimental results show that the proposed CBER is valuable in annotating short text documents to their best labels (classes). We used precision and recall measures to evaluate the proposed approach. CBER performance reached 93% and 94% in precision and recall, respectively.


2021 ◽  
Vol 40 (1) ◽  
pp. 551-563
Author(s):  
Liqiong Lu ◽  
Dong Wu ◽  
Ziwei Tang ◽  
Yaohua Yi ◽  
Faliang Huang

This paper focuses on script identification in natural scene images. Traditional CNNs (Convolution Neural Networks) cannot solve this problem perfectly for two reasons: one is the arbitrary aspect ratios of scene images which bring much difficulty to traditional CNNs with a fixed size image as the input. And the other is that some scripts with minor differences are easily confused because they share a subset of characters with the same shapes. We propose a novel approach combing Score CNN, Attention CNN and patches. Attention CNN is utilized to determine whether a patch is a discriminative patch and calculate the contribution weight of the discriminative patch to script identification of the whole image. Score CNN uses a discriminative patch as input and predict the score of each script type. Firstly patches with the same size are extracted from the scene images. Secondly these patches are used as inputs to Score CNN and Attention CNN to train two patch-level classifiers. Finally, the results of multiple discriminative patches extracted from the same image via the above two classifiers are fused to obtain the script type of this image. Using patches with the same size as inputs to CNN can avoid the problems caused by arbitrary aspect ratios of scene images. The trained classifiers can mine discriminative patches to accurately identify some confusing scripts. The experimental results show the good performance of our approach on four public datasets.


2016 ◽  
Vol 09 (03) ◽  
pp. 1650043 ◽  
Author(s):  
Haolin Wu ◽  
Jie Yang ◽  
Haibiao Chen ◽  
Feng Pan

Preferentially etching either carbon or silica from silicon oxycarbide (SiOC) created a porous network as an inverse image of the removed phase. The porous structure was analyzed by gas adsorption, and the experimental results verified the nanodomain structure of SiOC. This work demonstrated a novel approach for analyzing materials containing nanocomposite structures.


2014 ◽  
Vol 4 (1) ◽  
pp. 29-45 ◽  
Author(s):  
Rami Ayadi ◽  
Mohsen Maraoui ◽  
Mounir Zrigui

In this paper, the authors present latent topic model to index and represent the Arabic text documents reflecting more semantics. Text representation in a language with high inflectional morphology such as Arabic is not a trivial task and requires some special treatments. The authors describe our approach for analyzing and preprocessing Arabic text then we describe the stemming process. Finally, the latent model (LDA) is adapted to extract Arabic latent topics, the authors extracted significant topics of all texts, each theme is described by a particular distribution of descriptors then each text is represented on the vectors of these topics. The experiment of classification is conducted on in house corpus; latent topics are learned with LDA for different topic numbers K (25, 50, 75, and 100) then the authors compare this result with classification in the full words space. The results show that performances, in terms of precision, recall and f-measure, of classification in the reduced topics space outperform classification in full words space and when using LSI reduction.


2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Chun-Hui Wu ◽  
Chia-Wei Chen ◽  
Long-Sheng Kuo ◽  
Ping-Hei Chen

A novel approach was proposed to measure the hydraulic capacitance of a microfluidic membrane pump. Membrane deflection equations were modified from various studies to propose six theoretical equations to estimate the hydraulic capacitance of a microfluidic membrane pump. Thus, measuring the center deflection of the membrane allows the corresponding pressure and hydraulic capacitance of the pump to be determined. This study also investigated how membrane thickness affected the Young’s modulus of a polydimethylsiloxane (PDMS) membrane. Based on the experimental results, a linear correlation was proposed to estimate the hydraulic capacitance. The measured hydraulic capacitance data and the proposed equations in the linear and nonlinear regions qualitatively exhibited good agreement.


2013 ◽  
Vol 11 (06) ◽  
pp. 1343003 ◽  
Author(s):  
JING-DOO WANG

In this paper, three genomic materials — DNA sequences, protein sequences, and regions (domains) are used to compare methods of virus classification. Virus classes (categories) are divided by various taxonomic level of virus into three datasets for 6 order, 42 family, and 33 genera. To increase the robustness and comparability of experimental results of virus classification, the classes are selected that contain at least 10 instances, and meanwhile each instance contains at least one region name. Experimental results show that the approach using region names achieved the best accuracies — reaching 99.9%, 97.3%, and 99.0% for 6 orders, 42 families, and 33 genera, respectively. This paper not only involves exhaustive experiments that compare virus classifications using different genomic materials, but also proposes a novel approach to biological classification based on molecular biology instead of traditional morphology.


2021 ◽  
pp. 1-56
Author(s):  
Brandon Prickett

Abstract Since Halle (1962), explicit algebraic variables (often called alpha notation) have been commonplace in phonological theory. However, Hayes and Wilson (2008) proposed a variable-free model of phonotactic learning, sparking a debate about whether such algebraic representations are necessary to capture human phonological acquisition. While past experimental work has found evidence that suggested a need for variables in models of phonology (Berent et al. 2012, Moreton 2012, Gallagher 2013), this paper presents a novel mechanism, Probabilistic Feature Attention (PFA), that allows a variable-free model of phonotactics to predict a number of these phenomena. Additionally, experimental results involving phonological generalization that cannot be explained by variables are captured by this novel approach. These results cast doubt on whether variables are necessary to capture human-like phonotactic learning and provide a useful alternative to such representations.


Sign in / Sign up

Export Citation Format

Share Document