Predicting the Helpfulness of Online Restaurant Reviews Using Different Machine Learning Algorithms: A Case Study of Yelp

Yi Luo; Xiaowei Xu

doi:10.3390/su11195254

Predicting the Helpfulness of Online Restaurant Reviews Using Different Machine Learning Algorithms: A Case Study of Yelp

Sustainability ◽

10.3390/su11195254 ◽

2019 ◽

Vol 11 (19) ◽

pp. 5254 ◽

Cited By ~ 6

Author(s):

Yi Luo ◽

Xiaowei Xu

Keyword(s):

Latent Dirichlet Allocation ◽

Online Reviews ◽

Restaurant Industry ◽

Machine Learning Algorithms ◽

Support Vector ◽

Plain Text ◽

Sustainable Marketing ◽

Restaurant Reviews ◽

Negative Sentiment ◽

Sustainable Competitive Advantages

Helpful online reviews could be utilized to create sustainable marketing strategies in the restaurant industry, which contributes to national sustainable economic development. This study, the main aspects (including food/taste, experience, location, and value) from 294,034 reviews on Yelp.com were extracted empirically using the Latent Dirichlet Allocation (LDA) and positive and negative sentiment were assigned to each extracted aspect. Positive sentiments were associated with food/taste, while negative sentiments were associated with value. This study further proves a robust classification algorithm based on Support Vector Machine (SVM) with a Fuzzy Domain Ontology (FDO) algorithm outperforms other traditional classification algorithms such as Naïve Bayes (MB) and SVM ontology in predicting the helpfulness of online reviews. This study enriches the literature on managerial aspects of sustainability by analyzing a large amount of plain text data that customers generated. The results of this study could be used as sustainable marketing strategy for review website developers to design sophisticated, intelligence review systems by enabling customers to sort and filter helpful reviews based on their preferences. The extracted aspects and their assigned sentiment could also help restaurateurs better understand how to meet diverse customers’ needs and maintain sustainable competitive advantages.

Download Full-text

Detection of misinformation on garlic and COVID-19 in Twitter: A machine learning-based approach (Preprint)

10.2196/preprints.33056 ◽

2021 ◽

Author(s):

Myeong Gyu Kim ◽

Jae Hyun Kim ◽

Kyungim Kim

Keyword(s):

Machine Learning ◽

Social Media ◽

Latent Dirichlet Allocation ◽

Predictive Performance ◽

Machine Learning Algorithms ◽

Training Dataset ◽

Polynomial Kernel ◽

Support Vector ◽

Accurate Information ◽

Probability Number

BACKGROUND Garlic-related misinformation is prevalent whenever a virus outbreak occurs. Again, with the outbreak of coronavirus disease 2019 (COVID-19), garlic-related misinformation is spreading through social media sites, including Twitter. Machine learning-based approaches can be used to detect misinformation from vast tweets. OBJECTIVE This study aimed to develop machine learning algorithms for detecting misinformation on garlic and COVID-19 in Twitter. METHODS This study used 5,929 original tweets mentioning garlic and COVID-19. Tweets were manually labeled as misinformation, accurate information, and others. We tested the following algorithms: k-nearest neighbors; random forest; support vector machine (SVM) with linear, radial, and polynomial kernels; and neural network. Features for machine learning included user-based features (verified account, user type, number of followers, and follower rate) and text-based features (uniform resource locator, negation, sentiment score, Latent Dirichlet Allocation topic probability, number of retweets, and number of favorites). A model with the highest accuracy in the training dataset (70% of overall dataset) was tested using a test dataset (30% of overall dataset). Predictive performance was measured using overall accuracy, sensitivity, specificity, and balanced accuracy. RESULTS SVM with the polynomial kernel model showed the highest accuracy of 0.670. The model also showed a balanced accuracy of 0.757, sensitivity of 0.819, and specificity of 0.696 for misinformation. Important features in the misinformation and accurate information classes included topic 4 (common myths), topic 13 (garlic-specific myths), number of followers, topic 11 (misinformation on social media), and follower rate. Topic 3 (cooking recipes) was the most important feature in the others class. CONCLUSIONS Our SVM model showed good performance in detecting misinformation. The results of our study will help detect misinformation related to garlic and COVID-19. It could also be applied to prevent misinformation related to dietary supplements in the event of a future outbreak of a disease other than COVID-19.

Download Full-text

Seeking reward or avoiding risk from restaurant reviews: does distance matter?

International Journal of Contemporary Hospitality Management ◽

10.1108/ijchm-03-2018-0235 ◽

2019 ◽

Vol 31 (12) ◽

pp. 4482-4499 ◽

Cited By ~ 1

Author(s):

Esther L. Kim ◽

Sarah Tanford

Keyword(s):

Critical Role ◽

Online Reviews ◽

Restaurant Industry ◽

Risk Avoidance ◽

Content Type ◽

Customer Base ◽

Restaurant Reviews ◽

Gains And Losses ◽

Practical Implications

Purpose The purpose of this paper is to evaluate the extent to which consumers will exert more effort to avoid risk (negative reviews) versus seek reward (positive reviews) when making a restaurant decision. Design/methodology/approach This study investigates the influence of distance and review valence on restaurant decisions. A 2 (base restaurant review valence: negative, neutral) × 2 (target restaurant review valence: neutral, positive) × 2 (distance: 30 min, 60 min) between-subjects factorial design was used. Findings People exert more effort to seek a reward versus avoid a risk. People will drive any distance to dine at a restaurant with positive reviews. However, the tendency to avoid a restaurant with negative reviews declines as distance increases. Practical implications This study emphasizes the critical role of positive reviews in the restaurant industry. This research provides guidance to operators to manage online reviews effectively. The marketing strategy taking into account review valence and distance allows the business to attract new customers and grow its customer base. Originality/value This research synthesizes asymmetry effects and prospect theory with the level of risk associated with the outcome. This research is theoretically noteworthy since the finding of a reverse asymmetry principle is in contrast with the traditional belief of risk-avoidance when comparing gains and losses.

Download Full-text

Aspect Level Sentiment Analysis on Zoom Cloud Meetings App Review Using LDA

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v5i4.3143 ◽

2021 ◽

Vol 5 (4) ◽

pp. 631-638

Author(s):

Janu Akrama Wardhana ◽

Yuliant Sibaroni

Keyword(s):

Sentiment Analysis ◽

Latent Dirichlet Allocation ◽

Support Vector ◽

Video Conference ◽

Community Activities ◽

Performance Accuracy ◽

And Performance ◽

Almost All ◽

Negative Sentiment ◽

Google Play

During the Covid-19 pandemic, almost all community activities are conducted from home. Therefore, video conference technology is needed for people to carry out their normal activities from home. One of the video conference applications is ZOOM Cloud Meetings. Applications certainly have been reviewed given by their users as a reference for new users and companies of the application to know the application’s performance. However, in reviews, some constraints are the number of reviews as well as irregular. Therefore, a solution is needed with sentiment analysis that aims to classify the reviews of the application to be organized by categorizing positive or negative sentiment. In this study, aspect-based sentiment analysis was conducted on ZOOM Cloud Meetings app reviews from Google Play Store. The analysis’s result of the review data obtained three aspects, namely aspects of usability, system, and appearance. The modeling topic used is the Latent Dirichlet Allocation (LDA) method and classification using the Support Vector Machine (SVM). This research resulted in the best performance with the best parameters resulting in the performance accuracy of usability aspect is 88.83%, system aspect with 91.2%, appearance aspect with 94.78%, and performance accuracy of all aspects 91.61%.

Download Full-text

Entity Profiling to Identify Actor Involvement in Topics of Social Media Content

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.59869 ◽

2020 ◽

Vol 14 (4) ◽

pp. 417

Author(s):

Puji Winar Cahyo ◽

Muhammad Habibi

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Online News ◽

Support Vector ◽

Media Content ◽

Positive Sentiment ◽

Negative Sentiment

The efficiency of using social media affected modern society's nature and communication; they are more interested in talking through social media than meeting in the real world. The number of talks on social media content depends on the topic being discussed. The more topic interesting will impact the amount of data on social media will be. The data can be analyzed to get the influence of actors (account mentions) on the conversation. The power of an actor can be measured from how often the actor is mentioned in the conversation. This paper aims to conduct entity profiling on social media content to analyze an actor's influence on discussion. Furthermore, using sentiment analysis can determine the sentiment about an actor from a conversation topic. The Latent Dirichlet Allocation (LDA) method is used for analyzes topic modeling, while the Support Vector Machine (SVM) is used for sentiment analysis. This research can show that topics with positive sentiment are more likely to be involved in disaster management accounts, while topics with negative sentiment are more towards involvement in politicians, critics, and online news.

Download Full-text

Narrative framing of consumer sentiment in online restaurant reviews

First Monday ◽

10.5210/fm.v19i4.4944 ◽

2014 ◽

Cited By ~ 26

Author(s):

Dan Jurafsky ◽

Victor Chahuneau ◽

Bryan R. Routledge ◽

Noah A. Smith

Keyword(s):

Online Reviews ◽

Consumer Sentiment ◽

Social Psychological ◽

Psychological Variables ◽

The Past ◽

Third Person ◽

Restaurant Reviews ◽

Negative Sentiment ◽

Starchy Foods

The vast increase in online expressions of consumer sentiment offers a powerful new tool for studying consumer attitudes. To explore the narratives that consumers use to frame positive and negative sentiment online, we computationally investigate linguistic structure in 900,000 online restaurant reviews. Negative reviews, especially in expensive restaurants, were more likely to use features previously associated with narratives of trauma: negative emotional vocabulary, a focus on the past actions of third person actors such as waiters, and increased use of references to “we” and “us”, suggesting that negative reviews function as a means of coping with service–related trauma. Positive reviews also employed framings contextualized by expense: inexpensive restaurant reviews use the language of addiction to frame the reviewer as craving fatty or starchy foods. Positive reviews of expensive restaurants were long narratives using long words emphasizing the reviewer’s linguistic capital and also focusing on sensory pleasure. Our results demonstrate that portraying the self, whether as well–educated, as a victim, or even as addicted to chocolate, is a key function of reviews and suggests the important role of online reviews in exploring social psychological variables.

Download Full-text

Exploring the generalizability of discriminant word items and latent topics in online tourist reviews

International Journal of Contemporary Hospitality Management ◽

10.1108/ijchm-10-2015-0597 ◽

2017 ◽

Vol 29 (2) ◽

pp. 803-816 ◽

Cited By ~ 15

Author(s):

Astrid Dickinger ◽

Lidija Lalicic ◽

Josef Mazanec

Keyword(s):

Latent Dirichlet Allocation ◽

Online Reviews ◽

Support Vector ◽

Content Type ◽

Tourism Management ◽

Hospitality And Tourism Management ◽

Vector Machines ◽

Latent Topics ◽

Review Reports ◽

Limited Generalizability

Purpose Online reviews have been gaining relevance in hospitality and tourism management and represent an important research avenue for academia. This study aims to illustrate the discrimination between positive and negative reviews based on single word items and the sector-specific relevance of hidden topics. Design/methodology/approach By probing two parallel approaches of entirely unrelated analytical methods (penalized support vector machines and Latent Dirichlet Allocation), the analysts explore differences in language between favorable and unfavorable reviews in three service settings (hotels, restaurants and attractions). Findings The percentage of correctly predicted positive and negative review reports by means of individual word items does not decrease if reports from the three tourism businesses are analyzed together. Originality/value However, there is limited generalizability of the discriminant words across the three businesses. Also, the latent topics relevant for generating customers’ review reports differ significantly between the three sectors of tourism businesses.

Download Full-text

A Multi-Criteria Approach for Arabic Dialect Sentiment Analysis for Online Reviews: Exploiting Optimal Machine Learning Algorithm Selection

Sustainability ◽

10.3390/su131810018 ◽

2021 ◽

Vol 13 (18) ◽

pp. 10018 ◽

Cited By ~ 1

Author(s):

Mohamed Elhag Mohamed Abo ◽

Norisma Idris ◽

Rohana Mahmud ◽

Atika Qazi ◽

Ibrahim Abaker Targio Hashem ◽

...

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Online Reviews ◽

Machine Learning Algorithms ◽

Classification Error ◽

Support Vector ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Arabic Sentiment Analysis ◽

F Measure

A sentiment analysis of Arabic texts is an important task in many commercial applications such as Twitter. This study introduces a multi-criteria method to empirically assess and rank classifiers for Arabic sentiment analysis. Prominent machine learning algorithms were deployed to build classification models for Arabic sentiment analysis classifiers. Moreover, an assessment of the top five machine learning classifiers’ performances measures was discussed to rank the performance of the classifier. We integrated the top five ranking methods with evaluation metrics of machine learning classifiers such as accuracy, recall, precision, F-measure, CPU Time, classification error, and area under the curve (AUC). The method was tested using Saudi Arabic product reviews to compare five popular classifiers. Our results suggest that deep learning and support vector machine (SVM) classifiers perform best with accuracy 85.25%, 82.30%; precision 85.30, 83.87%; recall 88.41%, 83.89; F-measure 86.81, 83.87%; classification error 14.75, 17.70; and AUC 0.93, 0.90, respectively. They outperform decision trees, K-nearest neighbours (K-NN), and Naïve Bayes classifiers.

Download Full-text

Machine learning in medicine: a practical introduction to natural language processing

BMC Medical Research Methodology ◽

10.1186/s12874-021-01347-1 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Conrad J. Harrison ◽

Chris J. Sidey-Gibbons

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Mental Health Problems ◽

Characteristic Curve ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector

Abstract Background Unstructured text, including medical records, patient feedback, and social media comments, can be a rich source of data for clinical research. Natural language processing (NLP) describes a set of techniques used to convert passages of written text into interpretable datasets that can be analysed by statistical and machine learning (ML) models. The purpose of this paper is to provide a practical introduction to contemporary techniques for the analysis of text-data, using freely-available software. Methods We performed three NLP experiments using publicly-available data obtained from medicine review websites. First, we conducted lexicon-based sentiment analysis on open-text patient reviews of four drugs: Levothyroxine, Viagra, Oseltamivir and Apixaban. Next, we used unsupervised ML (latent Dirichlet allocation, LDA) to identify similar drugs in the dataset, based solely on their reviews. Finally, we developed three supervised ML algorithms to predict whether a drug review was associated with a positive or negative rating. These algorithms were: a regularised logistic regression, a support vector machine (SVM), and an artificial neural network (ANN). We compared the performance of these algorithms in terms of classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity and specificity. Results Levothyroxine and Viagra were reviewed with a higher proportion of positive sentiments than Oseltamivir and Apixaban. One of the three LDA clusters clearly represented drugs used to treat mental health problems. A common theme suggested by this cluster was drugs taking weeks or months to work. Another cluster clearly represented drugs used as contraceptives. Supervised machine learning algorithms predicted positive or negative drug ratings with classification accuracies ranging from 0.664, 95% CI [0.608, 0.716] for the regularised regression to 0.720, 95% CI [0.664,0.776] for the SVM. Conclusions In this paper, we present a conceptual overview of common techniques used to analyse large volumes of text, and provide reproducible code that can be readily applied to other research studies using open-source software.

Download Full-text

Using Machine Learning Algorithms on Prediction of Stock Price

Journal of Modeling and Optimization ◽

10.32732/jmo.2020.12.2.84 ◽

2020 ◽

Vol 12 (2) ◽

pp. 84-99

Author(s):

Li-Pang Chen

Keyword(s):

Machine Learning ◽

Stock Price ◽

Short Term Memory ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Short Term ◽

Learning Techniques ◽

Historical Database ◽

Long Short Term Memory

In this paper, we investigate analysis and prediction of the time-dependent data. We focus our attention on four different stocks are selected from Yahoo Finance historical database. To build up models and predict the future stock price, we consider three different machine learning techniques including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) and Support Vector Regression (SVR). By treating close price, open price, daily low, daily high, adjusted close price, and volume of trades as predictors in machine learning methods, it can be shown that the prediction accuracy is improved.

Download Full-text

A Comparative Study of Different Machine Learning Algorithms for Disease Prediction

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse/v7i7/0177 ◽

2017 ◽

Vol 7 (7) ◽

pp. 172

Author(s):

Anantvir Singh Romana

Keyword(s):

Machine Learning ◽

Subsequent Treatment ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Disease Prediction ◽

Classification Problems ◽

Learning Techniques ◽

Neural Network Classifiers ◽

Diagnostic Detection

Accurate diagnostic detection of the disease in a patient is critical and may alter the subsequent treatment and increase the chances of survival rate. Machine learning techniques have been instrumental in disease detection and are currently being used in various classification problems due to their accurate prediction performance. Various techniques may provide different desired accuracies and it is therefore imperative to use the most suitable method which provides the best desired results. This research seeks to provide comparative analysis of Support Vector Machine, Naïve bayes, J48 Decision Tree and neural network classifiers breast cancer and diabetes datsets.

Download Full-text