scholarly journals Review Web Pages Collector Tool for Thematic Corpus Creation

10.29007/qcjn ◽  
2018 ◽  
Author(s):  
Lisa Medrouk ◽  
Anna Pappa ◽  
Jugurtha Hallou

We present a method of automaticaly extracting and gathering specific data text from web pages, creating a thematic corpus of reviews for opinion mining and sentiment analysis. The internet is an immense source of machine-readable texts \cite{mcenery1996} suitable for linguistic corpus studies\cite{Fletcher04}\cite{Kilgarriff2003}. Though, specific tools of web information extraction research domain as well as from the NLP do not include an open source system able to provide a thematic corpus according to an end-user request\cite{Sharoff2006}.\\ The need of use natural texts as databank for opinion mining and sentiment analysis is increased since the expansion of the digital interaction between users and blogs, forums and social networks.\\ The RevScrap system is designed to provide an intuitive, easy-to-use interface able to extract specific information from accurate web pages returned by search engine's request and create a corpus composed by comments, reviews, opinions, as expressed by users' experience and feedback. The corpus is well structured in xml documents, reflected Singler's design criteria\cite{sinclair01}..

2012 ◽  
Vol 2 (3) ◽  
pp. 171-178 ◽  
Author(s):  
Mohammad Sadegh Hajmohammadi ◽  
Roliana Ibrahim ◽  
Zulaiha Ali Othman

In the past few years, a great attention has been received by web documents as a new source of individual opinions and experience. This situation is producing increasing interest in methods for automatically extracting and analyzing individual opinion from web documents such as customer reviews, weblogs and comments on news. This increase was due to the easy accessibility of documents on the web, as well as the fact that all these were already machine-readable on gaining. At the same time, Machine Learning methods in Natural Language Processing (NLP) and Information Retrieval were considerably increased development of practical methods, making these widely available corpora. Recently, many researchers have focused on this area. They are trying to fetch opinion information and analyze it automatically with computers. This new research domain is usually called Opinion Mining and Sentiment Analysis. . Until now, researchers have developed several techniques to the solution of the problem. This paper try to cover some techniques and approaches that be used in this area.


2019 ◽  
Vol 8 (3) ◽  
pp. 6634-6643 ◽  

Opinion mining and sentiment analysis are valuable to extract the useful subjective information out of text documents. Predicting the customer’s opinion on amazon products has several benefits like reducing customer churn, agent monitoring, handling multiple customers, tracking overall customer satisfaction, quick escalations, and upselling opportunities. However, performing sentiment analysis is a challenging task for the researchers in order to find the users sentiments from the large datasets, because of its unstructured nature, slangs, misspells and abbreviations. To address this problem, a new proposed system is developed in this research study. Here, the proposed system comprises of four major phases; data collection, pre-processing, key word extraction, and classification. Initially, the input data were collected from the dataset: amazon customer review. After collecting the data, preprocessing was carried-out for enhancing the quality of collected data. The pre-processing phase comprises of three systems; lemmatization, review spam detection, and removal of stop-words and URLs. Then, an effective topic modelling approach Latent Dirichlet Allocation (LDA) along with modified Possibilistic Fuzzy C-Means (PFCM) was applied to extract the keywords and also helps in identifying the concerned topics. The extracted keywords were classified into three forms (positive, negative and neutral) by applying an effective machine learning classifier: Convolutional Neural Network (CNN). The experimental outcome showed that the proposed system enhanced the accuracy in sentiment analysis up to 6-20% related to the existing systems.


Author(s):  
Mohammed N. Al-Kabi ◽  
Heider A. Wahsheh ◽  
Izzat M. Alsmadi

Sentiment Analysis/Opinion Mining is associated with social media and usually aims to automatically identify the polarities of different points of views of the users of the social media about different aspects of life. The polarity of a sentiment reflects the point view of its author about a certain issue. This study aims to present a new method to identify the polarity of Arabic reviews and comments whether they are written in Modern Standard Arabic (MSA), or one of the Arabic Dialects, and/or include Emoticons. The proposed method is called Detection of Arabic Sentiment Analysis Polarity (DASAP). A modest dataset of Arabic comments, posts, and reviews is collected from Online social network websites (i.e. Facebook, Blogs, YouTube, and Twitter). This dataset is used to evaluate the effectiveness of the proposed method (DASAP). Receiver Operating Characteristic (ROC) prediction quality measurements are used to evaluate the effectiveness of DASAP based on the collected dataset.


2016 ◽  
Vol 10 (1) ◽  
pp. 87-98 ◽  
Author(s):  
Victoria Uren ◽  
Daniel Wright ◽  
James Scott ◽  
Yulan He ◽  
Hassan Saif

Purpose – This paper aims to address the following challenge: the push to widen participation in public consultation suggests social media as an additional mechanism through which to engage the public. Bioenergy companies need to build their capacity to communicate in these new media and to monitor the attitudes of the public and opposition organizations towards energy development projects. Design/methodology/approach – This short paper outlines the planning issues bioenergy developments face and the main methods of communication used in the public consultation process in the UK. The potential role of social media in communication with stakeholders is identified. The capacity of sentiment analysis to mine opinions from social media is summarised and illustrated using a sample of tweets containing the term “bioenergy”. Findings – Social media have the potential to improve information flows between stakeholders and developers. Sentiment analysis is a viable methodology, which bioenergy companies should be using to measure public opinion in the consultation process. Preliminary analysis shows promising results. Research limitations/implications – Analysis is preliminary and based on a small dataset. It is intended only to illustrate the potential of sentiment analysis and not to draw general conclusions about the bioenergy sector. Social implications – Social media have the potential to open access to the consultation process and help bioenergy companies to make use of waste for energy developments. Originality/value – Opinion mining, though established in marketing and political analysis, is not yet systematically applied as a planning consultation tool. This is a missed opportunity.


2012 ◽  
Vol 157-158 ◽  
pp. 1079-1082
Author(s):  
Guo Shi Wu ◽  
Xiao Yin Wu ◽  
Jing Jing Wei

One of the most widely-studied sub-problems of opinion mining is sentiment classification, which includes three study levels: word, sentence and document. At the third level, most of the existing methods ignore comparative sentences which have particular sentence patterns and may lower the precision of the document-level analysis. This paper studies sentiment analysis of comparative sentences. The aim is to determine whether opinions expressed in a comparative sentence are positive or negative. Experiments of comparing with document-level sentiment analysis based on simple sentences shows the effectiveness of the proposed method.


Sign in / Sign up

Export Citation Format

Share Document