scholarly journals Identifying Key Target Audiences for Public Health Campaigns: Leveraging Machine Learning in the Case of Hookah Tobacco Smoking (Preprint)

2018 ◽  
Author(s):  
Kar-Hai Chu ◽  
Jason Colditz ◽  
Momin Malik ◽  
Tabitha Yates ◽  
Brian Primack

BACKGROUND Hookah tobacco smoking (HTS) is a particularly important issue for public health professionals to address owing to its prevalence and deleterious health effects. Social media sites can be a valuable tool for public health officials to conduct informational health campaigns. Current social media platforms provide researchers with opportunities to better identify and target specific audiences and even individuals. However, we are not aware of systematic research attempting to identify audiences with mixed or ambivalent views toward HTS. OBJECTIVE The objective of this study was to (1) confirm previous research showing positively skewed HTS sentiment on Twitter using a larger dataset by leveraging machine learning techniques and (2) systematically identify individuals who exhibit mixed opinions about HTS via the Twitter platform and therefore represent key audiences for intervention. METHODS We prospectively collected tweets related to HTS from January to June 2016. We double-coded sentiment for a subset of approximately 5000 randomly sampled tweets for sentiment toward HTS and used these data to train a machine learning classifier to assess the remaining approximately 556,000 HTS-related Twitter posts. Natural language processing software was used to extract linguistic features (ie, language-based covariates). The data were processed by machine learning tools and algorithms using R. Finally, we used the results to identify individuals who, because they had consistently posted both positive and negative content, might be ambivalent toward HTS and represent an ideal audience for intervention. RESULTS There were 561,960 HTS-related tweets: 373,911 were classified as positive and 183,139 were classified as negative. A set of 12,861 users met a priori criteria indicating that they posted both positive and negative tweets about HTS. CONCLUSIONS Sentiment analysis can allow researchers to identify audience segments on social media that demonstrate ambiguity toward key public health issues, such as HTS, and therefore represent ideal populations for intervention. Using large social media datasets can help public health officials to preemptively identify specific audience segments that would be most receptive to targeted campaigns.

2021 ◽  
Vol 11 (19) ◽  
pp. 9292
Author(s):  
Noman Islam ◽  
Asadullah Shaikh ◽  
Asma Qaiser ◽  
Yousef Asiri ◽  
Sultan Almakdi ◽  
...  

In recent years, the consumption of social media content to keep up with global news and to verify its authenticity has become a considerable challenge. Social media enables us to easily access news anywhere, anytime, but it also gives rise to the spread of fake news, thereby delivering false information. This also has a negative impact on society. Therefore, it is necessary to determine whether or not news spreading over social media is real. This will allow for confusion among social media users to be avoided, and it is important in ensuring positive social development. This paper proposes a novel solution by detecting the authenticity of news through natural language processing techniques. Specifically, this paper proposes a novel scheme comprising three steps, namely, stance detection, author credibility verification, and machine learning-based classification, to verify the authenticity of news. In the last stage of the proposed pipeline, several machine learning techniques are applied, such as decision trees, random forest, logistic regression, and support vector machine (SVM) algorithms. For this study, the fake news dataset was taken from Kaggle. The experimental results show an accuracy of 93.15%, precision of 92.65%, recall of 95.71%, and F1-score of 94.15% for the support vector machine algorithm. The SVM is better than the second best classifier, i.e., logistic regression, by 6.82%.


2020 ◽  
Author(s):  
Hanyin Wang ◽  
Yikuan Li ◽  
Meghan Hutch ◽  
Andrew Naidech ◽  
Yuan Luo

BACKGROUND The emergence of SARS-CoV-2 (ie, COVID-19) has given rise to a global pandemic affecting 215 countries and over 40 million people as of October 2020. Meanwhile, we are also experiencing an infodemic induced by the overabundance of information, some accurate and some inaccurate, spreading rapidly across social media platforms. Social media has arguably shifted the information acquisition and dissemination of a considerably large population of internet users toward higher interactivities. OBJECTIVE This study aimed to investigate COVID-19-related health beliefs on one of the mainstream social media platforms, Twitter, as well as potential impacting factors associated with fluctuations in health beliefs on social media. METHODS We used COVID-19-related posts from the mainstream social media platform Twitter to monitor health beliefs. A total of 92,687,660 tweets corresponding to 8,967,986 unique users from January 6 to June 21, 2020, were retrieved. To quantify health beliefs, we employed the health belief model (HBM) with four core constructs: perceived susceptibility, perceived severity, perceived benefits, and perceived barriers. We utilized natural language processing and machine learning techniques to automate the process of judging the conformity of each tweet with each of the four HBM constructs. A total of 5000 tweets were manually annotated for training the machine learning architectures. RESULTS The machine learning classifiers yielded areas under the receiver operating characteristic curves over 0.86 for the classification of all four HBM constructs. Our analyses revealed a basic reproduction number <i>R</i><sub>0</sub> of 7.62 for trends in the number of Twitter users posting health belief–related content over the study period. The fluctuations in the number of health belief–related tweets could reflect dynamics in case and death statistics, systematic interventions, and public events. Specifically, we observed that scientific events, such as scientific publications, and nonscientific events, such as politicians’ speeches, were comparable in their ability to influence health belief trends on social media through a Kruskal-Wallis test (<i>P</i>=.78 and <i>P</i>=.92 for perceived benefits and perceived barriers, respectively). CONCLUSIONS As an analogy of the classic epidemiology model where an infection is considered to be spreading in a population with an <i>R</i><sub>0</sub> greater than 1, we found that the number of users tweeting about COVID-19 health beliefs was amplifying in an epidemic manner and could partially intensify the infodemic. It is “unhealthy” that both scientific and nonscientific events constitute no disparity in impacting the health belief trends on Twitter, since nonscientific events, such as politicians’ speeches, might not be endorsed by substantial evidence and could sometimes be misleading.


Author(s):  
Erick Omuya ◽  
George Okeyo ◽  
Michael Kimwele

Social media has been embraced by different people as a convenient and official medium of communication. People write messages and attach images and videos on Twitter, Facebook and other social media which they share. Social media therefore generates a lot of data that is rich in sentiments from these updates. Sentiment analysis has been used to determine opinions of clients, for instance, relating to a particular product or company. Knowledge based approach and Machine learning approach are among the strategies that have been used to analyze these sentiments. The performance of sentiment analysis is however distorted by noise, the curse of dimensionality, the data domains and size of data used for training and testing. This research aims at developing a model for sentiment analysis in which dimensionality reduction and the use of different parts of speech improves sentiment analysis performance. It uses natural language processing for filtering, storing and performing sentiment analysis on the data from social media. The model is tested using Naïve Bayes, Support Vector Machines and K-Nearest neighbor machine learning algorithms and its performance compared with that of two other Sentiment Analysis models. Experimental results show that the model improves sentiment analysis performance using machine learning techniques.


10.2196/12443 ◽  
2019 ◽  
Vol 21 (7) ◽  
pp. e12443 ◽  
Author(s):  
Kar-Hai Chu ◽  
Jason Colditz ◽  
Momin Malik ◽  
Tabitha Yates ◽  
Brian Primack

10.2196/26302 ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. e26302 ◽  
Author(s):  
Hanyin Wang ◽  
Yikuan Li ◽  
Meghan Hutch ◽  
Andrew Naidech ◽  
Yuan Luo

Background The emergence of SARS-CoV-2 (ie, COVID-19) has given rise to a global pandemic affecting 215 countries and over 40 million people as of October 2020. Meanwhile, we are also experiencing an infodemic induced by the overabundance of information, some accurate and some inaccurate, spreading rapidly across social media platforms. Social media has arguably shifted the information acquisition and dissemination of a considerably large population of internet users toward higher interactivities. Objective This study aimed to investigate COVID-19-related health beliefs on one of the mainstream social media platforms, Twitter, as well as potential impacting factors associated with fluctuations in health beliefs on social media. Methods We used COVID-19-related posts from the mainstream social media platform Twitter to monitor health beliefs. A total of 92,687,660 tweets corresponding to 8,967,986 unique users from January 6 to June 21, 2020, were retrieved. To quantify health beliefs, we employed the health belief model (HBM) with four core constructs: perceived susceptibility, perceived severity, perceived benefits, and perceived barriers. We utilized natural language processing and machine learning techniques to automate the process of judging the conformity of each tweet with each of the four HBM constructs. A total of 5000 tweets were manually annotated for training the machine learning architectures. Results The machine learning classifiers yielded areas under the receiver operating characteristic curves over 0.86 for the classification of all four HBM constructs. Our analyses revealed a basic reproduction number R0 of 7.62 for trends in the number of Twitter users posting health belief–related content over the study period. The fluctuations in the number of health belief–related tweets could reflect dynamics in case and death statistics, systematic interventions, and public events. Specifically, we observed that scientific events, such as scientific publications, and nonscientific events, such as politicians’ speeches, were comparable in their ability to influence health belief trends on social media through a Kruskal-Wallis test (P=.78 and P=.92 for perceived benefits and perceived barriers, respectively). Conclusions As an analogy of the classic epidemiology model where an infection is considered to be spreading in a population with an R0 greater than 1, we found that the number of users tweeting about COVID-19 health beliefs was amplifying in an epidemic manner and could partially intensify the infodemic. It is “unhealthy” that both scientific and nonscientific events constitute no disparity in impacting the health belief trends on Twitter, since nonscientific events, such as politicians’ speeches, might not be endorsed by substantial evidence and could sometimes be misleading.


2020 ◽  
Vol 17 (12) ◽  
pp. 5477-5482
Author(s):  
Shaik Rahamat Basha ◽  
M. Surya Bhupal Rao ◽  
P. Kiran Kumar Reddy ◽  
G. Ravi Kumar

Online Social media are a huge source of regular communication since most people in the world today use these services to stay communicating with each other in their modern lives. Today’s research has been implemented on emotion recognition by message. The majority of the research uses a method of machine learning. In order to extract information from the textual text written by human beings, natural language processing (NLP) techniques were used. The emotion of humans may be expressed when reading or writing a message. Human beings are willing, since human life is filled with a variety of emotions, to feel various emotions. This analysis helps us to realize the use of text processing and text mining methods by social media researchers in order to classify key data themes. Our experiments presented that the two main social networks in the world are conducting text-based mining on Facebook and Twitter. In this proposed study, we categorized the human feelings such as joy, fear, love, anger, surprise, sadness and thankfulness and compared our results using various methods of machine learning.


2019 ◽  
Vol 28 (01) ◽  
pp. 208-217 ◽  
Author(s):  
Mike Conway ◽  
Mengke Hu ◽  
Wendy W. Chapman

Objective: We present a narrative review of recent work on the utilisation of Natural Language Processing (NLP) for the analysis of social media (including online health communities) specifically for public health applications. Methods: We conducted a literature review of NLP research that utilised social media or online consumer-generated text for public health applications, focussing on the years 2016 to 2018. Papers were identified in several ways, including PubMed searches and the inspection of recent conference proceedings from the Association of Computational Linguistics (ACL), the Conference on Human Factors in Computing Systems (CHI), and the International AAAI (Association for the Advancement of Artificial Intelligence) Conference on Web and Social Media (ICWSM). Popular data sources included Twitter, Reddit, various online health communities, and Facebook. Results: In the recent past, communicable diseases (e.g., influenza, dengue) have been the focus of much social media-based NLP health research. However, mental health and substance use and abuse (including the use of tobacco, alcohol, marijuana, and opioids) have been the subject of an increasing volume of research in the 2016 - 2018 period. Associated with this trend, the use of lexicon-based methods remains popular given the availability of psychologically validated lexical resources suitable for mental health and substance abuse research. Finally, we found that in the period under review “modern" machine learning methods (i.e. deep neural-network-based methods), while increasing in popularity, remain less widely used than “classical" machine learning methods.


Social media sites are used today for the development of different types and nature of customers those use such benefits which are often shared by people on social media symbolic or textual opinions, ideas, and feelings. This attitude and orientation draw attention to research and analyze sentiments through online data about customer interest. Therefore, the sentimental analysis idea is proposed. This is among the various uses of Natural Language Processing (NLP) and Machine Learning Analysis (MLA) is very common. The main task of sentimental analysis is the classification of sentiments automatically into three categories that are positive, negative and neutral. Many classification researches are conducted over the years to know the exact feelings and situations of sentimental emotions of people. Classification, fuzzy and clustering, is used. To know the sentiment analysis of the people’s accurate feeling and situation, many times over the years classification research was conducted in past. The accuracy of classification is finding more in Fuzzy based. Fuzzy based classification finds more accurate and for comparative study execution Classical Text Classifications Model is used. In comparative performance, this study shows the possibility of implementing the proposed method able to provide more accurate results when it comes in comparison with conventional classifiers. In this article we have discussed different researchers worked on the method of sentiment analysis and classification. This article also shows the importance of extracting comments and analyze sentiments


Sign in / Sign up

Export Citation Format

Share Document