Topic Word Embedding-Based Methods for Automatically Extracting Main Aspects from Product Reviews

Sang-Min Park; Sung Joon Lee; Byung-Won On

doi:10.3390/app10113831

Topic Word Embedding-Based Methods for Automatically Extracting Main Aspects from Product Reviews

Applied Sciences ◽

10.3390/app10113831 ◽

2020 ◽

Vol 10 (11) ◽

pp. 3831 ◽

Cited By ~ 1

Author(s):

Sang-Min Park ◽

Sung Joon Lee ◽

Byung-Won On

Keyword(s):

Prior Knowledge ◽

Topic Model ◽

Unsupervised Clustering ◽

Word Embedding ◽

Experimental Results ◽

Product Reviews ◽

Baseline Method ◽

Text Documents ◽

Novel Approach ◽

Topic Word

Detecting the main aspects of a particular product from a collection of review documents is so challenging in real applications. To address this problem, we focus on utilizing existing topic models that can briefly summarize large text documents. Unlike existing approaches that are limited because of modifying any topic model or using seed opinion words as prior knowledge, we propose a novel approach of (1) identifying starting points for learning, (2) cleaning dirty topic results through word embedding and unsupervised clustering, and (3) automatically generating right aspects using topic and head word embedding. Experimental results show that the proposed methods create more clean topics, improving about 25% of Rouge–1, compared to the baseline method. In addition, through the proposed three methods, the main aspects suitable for given data are detected automatically.

Download Full-text

Revealing Groups of Semantically Close Textual Documents by Clustering

Advances in Linguistics and Communication Studies - Modern Computational Models of Semantic Discovery in Natural Language ◽

10.4018/978-1-4666-8690-8.ch004 ◽

2015 ◽

pp. 71-111

Author(s):

František Dařena ◽

Jan Žižka

Keyword(s):

Prior Knowledge ◽

Clustering Algorithms ◽

Data Preprocessing ◽

Unsupervised Clustering ◽

External Evaluation ◽

Natural Languages ◽

Text Documents ◽

Customer Reviews ◽

Textual Data ◽

Expert Validation

The chapter introduces clustering as a family of algorithms that can be successfully used to organize text documents into groups without prior knowledge of these groups. The chapter also demonstrates using unsupervised clustering to group large amount of unlabeled textual data (customer reviews written informally in five natural languages) so it can be used later for further analysis. The attention is paid to the process of selecting clustering algorithms, their parameters, methods of data preprocessing, and to the methods of evaluating the results by a human expert with an assistance of computers, too. The feasibility has been demonstrated by a number of experiments with external evaluation using known labels and expert validation with an assistance of a computer. It has been found that it is possible to apply the same procedures, including clustering, cluster validation, and detection of topics and significant words for different natural languages with satisfactory results.

Download Full-text

Text documents screen watermarking by changing background brightness in the interline spacing

Proceedings of the Institute for System Programming of RAS ◽

10.15514/ispras-2021-33(4)-11 ◽

2021 ◽

Vol 33 (4) ◽

pp. 147-162

Author(s):

Aleksey Yur'evich Yakushev ◽

Yury Vital'evich Markin ◽

Stanislav Alexandrovich Fomin ◽

Dmitry Olegovich Obydenkov ◽

Boris Vladimirovich Kondrat’ev

Keyword(s):

Digital Camera ◽

Short Review ◽

Experimental Results ◽

Original Image ◽

Text Documents ◽

Novel Approach ◽

Extraction Algorithm ◽

Original Message ◽

Text Images ◽

Background Brightness

One of the most common ways documents leak is taking a picture of document displayed on the screen. For investigation of such cases data leakage prevention technologies including screen watermarking are used. The article gives short review on the problem of screen shooting watermarking and the existing research results. A novel approach for watermarking text images displayed on the screen is proposed. The watermark is embedded as slight changes in luminance into the interline spacing of marked text. The watermark is designed to be invisible for human eye but still able to be detected by digital camera. An algorithm for extraction of watermark from the screen photo is presented. The extraction algorithm doesn’t need the original image of document for successful extraction. The experimental results show that the approach is robust against screen-cam attacks, that means that the watermark stays persistent after the process of taking a photo of document displayed on the screen. A criterion for watermark message extraction accuracy without knowledge about the original message is proposed. The criterion represents the probability that the watermark was extracted correctly.

Download Full-text

CBER: An Effective Classification Approach Based on Enrichment Representation for Short Text Documents

Journal of Intelligent Systems ◽

10.1515/jisys-2015-0066 ◽

2017 ◽

Vol 26 (2) ◽

pp. 233-241

Author(s):

Eman Ismail ◽

Walaa Gad

Keyword(s):

Experimental Results ◽

Text Documents ◽

Short Text ◽

Classification Approach ◽

Semantic Relationships ◽

Text Document ◽

Novel Approach

AbstractIn this paper, we propose a novel approach called Classification Based on Enrichment Representation (CBER) of short text documents. The proposed approach extracts concepts occurring in short text documents and uses them to calculate the weight of the synonyms of each concept. Concepts with the same meanings will increase the weights of their synonyms. However, the text document is short and concepts are rarely repeated; therefore, we capture the semantic relationships among concepts and solve the disambiguation problem. The experimental results show that the proposed CBER is valuable in annotating short text documents to their best labels (classes). We used precision and recall measures to evaluate the proposed approach. CBER performance reached 93% and 94% in precision and recall, respectively.

Download Full-text

Mining discriminative patches for script identification in natural scene images

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200260 ◽

2021 ◽

Vol 40 (1) ◽

pp. 551-563

Author(s):

Liqiong Lu ◽

Dong Wu ◽

Ziwei Tang ◽

Yaohua Yi ◽

Faliang Huang

Keyword(s):

Neural Networks ◽

Experimental Results ◽

The Other ◽

Natural Scene ◽

Fixed Size ◽

Script Identification ◽

Aspect Ratios ◽

Novel Approach ◽

Public Datasets ◽

Natural Scene Images

This paper focuses on script identification in natural scene images. Traditional CNNs (Convolution Neural Networks) cannot solve this problem perfectly for two reasons: one is the arbitrary aspect ratios of scene images which bring much difficulty to traditional CNNs with a fixed size image as the input. And the other is that some scripts with minor differences are easily confused because they share a subset of characters with the same shapes. We propose a novel approach combing Score CNN, Attention CNN and patches. Attention CNN is utilized to determine whether a patch is a discriminative patch and calculate the contribution weight of the discriminative patch to script identification of the whole image. Score CNN uses a discriminative patch as input and predict the score of each script type. Firstly patches with the same size are extracted from the scene images. Secondly these patches are used as inputs to Score CNN and Attention CNN to train two patch-level classifiers. Finally, the results of multiple discriminative patches extracted from the same image via the above two classifiers are fused to obtain the script type of this image. Using patches with the same size as inputs to CNN can avoid the problems caused by arbitrary aspect ratios of scene images. The trained classifiers can mine discriminative patches to accurately identify some confusing scripts. The experimental results show the good performance of our approach on four public datasets.

Download Full-text

Revealing the nanodomain structure of silicon oxycarbide via preferential etching and pore analysis

Functional Materials Letters ◽

10.1142/s1793604716500430 ◽

2016 ◽

Vol 09 (03) ◽

pp. 1650043 ◽

Cited By ~ 3

Author(s):

Haolin Wu ◽

Jie Yang ◽

Haibiao Chen ◽

Feng Pan

Keyword(s):

Porous Structure ◽

Gas Adsorption ◽

Inverse Image ◽

Silicon Oxycarbide ◽

Experimental Results ◽

Porous Network ◽

Pore Analysis ◽

Novel Approach ◽

Nanodomain Structure ◽

Nanocomposite Structures

Preferentially etching either carbon or silica from silicon oxycarbide (SiOC) created a porous network as an inverse image of the removed phase. The porous structure was analyzed by gas adsorption, and the experimental results verified the nanodomain structure of SiOC. This work demonstrated a novel approach for analyzing materials containing nanocomposite structures.

Download Full-text

Latent Topic Model for Indexing Arabic Documents

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2014010102 ◽

2014 ◽

Vol 4 (1) ◽

pp. 29-45 ◽

Cited By ~ 3

Author(s):

Rami Ayadi ◽

Mohsen Maraoui ◽

Mounir Zrigui

Keyword(s):

Topic Model ◽

Inflectional Morphology ◽

Arabic Text ◽

Text Representation ◽

Text Documents ◽

Latent Topic ◽

Latent Topics ◽

F Measure

In this paper, the authors present latent topic model to index and represent the Arabic text documents reflecting more semantics. Text representation in a language with high inflectional morphology such as Arabic is not a trivial task and requires some special treatments. The authors describe our approach for analyzing and preprocessing Arabic text then we describe the stemming process. Finally, the latent model (LDA) is adapted to extract Arabic latent topics, the authors extracted significant topics of all texts, each theme is described by a particular distribution of descriptors then each text is represented on the vectors of these topics. The experiment of classification is conducted on in house corpus; latent topics are learned with LDA for different topic numbers K (25, 50, 75, and 100) then the authors compare this result with classification in the full words space. The results show that performances, in terms of precision, recall and f-measure, of classification in the reduced topics space outperform classification in full words space and when using LSI reduction.

Download Full-text

A Novel Approach to Measure the Hydraulic Capacitance of a Microfluidic Membrane Pump

Advances in Materials Science and Engineering ◽

10.1155/2014/198620 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Chun-Hui Wu ◽

Chia-Wei Chen ◽

Long-Sheng Kuo ◽

Ping-Hei Chen

Keyword(s):

Linear Correlation ◽

Experimental Results ◽

Membrane Thickness ◽

Membrane Pump ◽

Novel Approach ◽

Hydraulic Capacitance ◽

Linear And Nonlinear ◽

Center Deflection ◽

Membrane Deflection ◽

Good Agreement

A novel approach was proposed to measure the hydraulic capacitance of a microfluidic membrane pump. Membrane deflection equations were modified from various studies to propose six theoretical equations to estimate the hydraulic capacitance of a microfluidic membrane pump. Thus, measuring the center deflection of the membrane allows the corresponding pressure and hydraulic capacitance of the pump to be determined. This study also investigated how membrane thickness affected the Young’s modulus of a polydimethylsiloxane (PDMS) membrane. Based on the experimental results, a linear correlation was proposed to estimate the hydraulic capacitance. The measured hydraulic capacitance data and the proposed equations in the linear and nonlinear regions qualitatively exhibited good agreement.

Download Full-text

Research on Web Service Clustering Method Based on Word Embedding and Topic Model

Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-030-32591-6_107 ◽

2019 ◽

pp. 980-987

Author(s):

Yanping Chen ◽

Xin Wang ◽

Hong Xia ◽

Zhongmin Wang ◽

Zhong Yv

Keyword(s):

Web Service ◽

Topic Model ◽

Word Embedding ◽

Clustering Method ◽

Service Clustering

Download Full-text

COMPARING VIRUS CLASSIFICATION USING GENOMIC MATERIALS ACCORDING TO DIFFERENT TAXONOMIC LEVELS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720013430038 ◽

2013 ◽

Vol 11 (06) ◽

pp. 1343003 ◽

Cited By ~ 4

Author(s):

JING-DOO WANG

Keyword(s):

Molecular Biology ◽

Dna Sequences ◽

Protein Sequences ◽

Experimental Results ◽

Virus Classification ◽

Taxonomic Level ◽

Biological Classification ◽

Novel Approach

In this paper, three genomic materials — DNA sequences, protein sequences, and regions (domains) are used to compare methods of virus classification. Virus classes (categories) are divided by various taxonomic level of virus into three datasets for 6 order, 42 family, and 33 genera. To increase the robustness and comparability of experimental results of virus classification, the classes are selected that contain at least 10 instances, and meanwhile each instance contains at least one region name. Experimental results show that the approach using region names achieved the best accuracies — reaching 99.9%, 97.3%, and 99.0% for 6 orders, 42 families, and 33 genera, respectively. This paper not only involves exhaustive experiments that compare virus classifications using different genomic materials, but also proposes a novel approach to biological classification based on molecular biology instead of traditional morphology.

Download Full-text

Probabilistic Feature Attention as an Alternative to Variables in Phonotactic Learning

Linguistic Inquiry ◽

10.1162/ling_a_00440 ◽

2021 ◽

pp. 1-56

Author(s):

Brandon Prickett

Keyword(s):

Experimental Work ◽

Experimental Results ◽

Phonological Acquisition ◽

Novel Approach ◽

Free Model ◽

Cast Doubt ◽

Feature Attention ◽

Phonotactic Learning ◽

Past Experimental Work ◽

Algebraic Representations

Abstract Since Halle (1962), explicit algebraic variables (often called alpha notation) have been commonplace in phonological theory. However, Hayes and Wilson (2008) proposed a variable-free model of phonotactic learning, sparking a debate about whether such algebraic representations are necessary to capture human phonological acquisition. While past experimental work has found evidence that suggested a need for variables in models of phonology (Berent et al. 2012, Moreton 2012, Gallagher 2013), this paper presents a novel mechanism, Probabilistic Feature Attention (PFA), that allows a variable-free model of phonotactics to predict a number of these phenomena. Additionally, experimental results involving phonological generalization that cannot be explained by variables are captured by this novel approach. These results cast doubt on whether variables are necessary to capture human-like phonotactic learning and provide a useful alternative to such representations.

Download Full-text