Social Media Rumor Refuter Feature Analysis and Crowd Identification Based on XGBoost and NLP

Zongmin Li; Qi Zhang; Yuhong Wang; Shihang Wang

doi:10.3390/app10144711

Social Media Rumor Refuter Feature Analysis and Crowd Identification Based on XGBoost and NLP

Applied Sciences ◽

10.3390/app10144711 ◽

2020 ◽

Vol 10 (14) ◽

pp. 4711 ◽

Cited By ~ 1

Author(s):

Zongmin Li ◽

Qi Zhang ◽

Yuhong Wang ◽

Shihang Wang

Keyword(s):

Social Media ◽

Language Processing ◽

Classification Model ◽

Dark Side ◽

Feature Analysis ◽

Online Information ◽

Sina Weibo ◽

Machine Learning Methods ◽

Model Training ◽

Shed Light

One prominent dark side of online information behavior is the spreading of rumors. The feature analysis and crowd identification of social media rumor refuters based on machine learning methods can shed light on the rumor refutation process. This paper analyzed the association between user features and rumor refuting behavior in five main rumor categories: economics, society, disaster, politics, and military. Natural language processing (NLP) techniques are applied to quantify the user’s sentiment tendency and recent interests. Then, those results were combined with other personalized features to train an XGBoost classification model, and potential refuters can be identified. Information from 58,807 Sina Weibo users (including their 646,877 microblogs) for the five anti-rumor microblog categories was collected for model training and feature analysis. The results revealed that there were significant differences between rumor stiflers and refuters, as well as between refuters for different categories. Refuters tended to be more active on social media and a large proportion of them gathered in more developed regions. Tweeting history was a vital reference as well, and refuters showed higher interest in topics related with the rumor refuting message. Meanwhile, features such as gender, age, user labels and sentiment tendency also varied between refuters considering categories.

Download Full-text

Social Media Content Categorization Using Supervised Based Machine Learning Methods and Natural Language Processing in Bangla Language

2020 11th International Conference on Electrical and Computer Engineering (ICECE) ◽

10.1109/icece51571.2020.9393095 ◽

2020 ◽

Author(s):

Md. Rejaul Alam ◽

Afsana Akter ◽

Minhajul Abedin Shafin ◽

Md. Mehedi Hasan ◽

Antara Mahmud

Keyword(s):

Machine Learning ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Media Content ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Intelligent Detection of False Information in Arabic Tweets Utilizing Hybrid Harris Hawks Based Feature Selection and Machine Learning Models

Symmetry ◽

10.3390/sym13040556 ◽

2021 ◽

Vol 13 (4) ◽

pp. 556

Author(s):

Thaer Thaher ◽

Mahmoud Saheb ◽

Hamza Turabieh ◽

Hamouda Chantar

Keyword(s):

Machine Learning ◽

Social Media ◽

Feature Selection ◽

Language Processing ◽

User Profile ◽

Vital Role ◽

Classification Model ◽

Fake News ◽

False Information ◽

Social Media Platforms

Fake or false information on social media platforms is a significant challenge that leads to deliberately misleading users due to the inclusion of rumors, propaganda, or deceptive information about a person, organization, or service. Twitter is one of the most widely used social media platforms, especially in the Arab region, where the number of users is steadily increasing, accompanied by an increase in the rate of fake news. This drew the attention of researchers to provide a safe online environment free of misleading information. This paper aims to propose a smart classification model for the early detection of fake news in Arabic tweets utilizing Natural Language Processing (NLP) techniques, Machine Learning (ML) models, and Harris Hawks Optimizer (HHO) as a wrapper-based feature selection approach. Arabic Twitter corpus composed of 1862 previously annotated tweets was utilized by this research to assess the efficiency of the proposed model. The Bag of Words (BoW) model is utilized using different term-weighting schemes for feature extraction. Eight well-known learning algorithms are investigated with varying combinations of features, including user-profile, content-based, and words-features. Reported results showed that the Logistic Regression (LR) with Term Frequency-Inverse Document Frequency (TF-IDF) model scores the best rank. Moreover, feature selection based on the binary HHO algorithm plays a vital role in reducing dimensionality, thereby enhancing the learning model’s performance for fake news detection. Interestingly, the proposed BHHO-LR model can yield a better enhancement of 5% compared with previous works on the same dataset.

Download Full-text

Perceiving Residents’ Festival Activities Based on Social Media Data: A Case Study in Beijing, China

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10070474 ◽

2021 ◽

Vol 10 (7) ◽

pp. 474

Author(s):

Bingqing Wang ◽

Bin Meng ◽

Juan Wang ◽

Siyu Chen ◽

Jian Liu

Keyword(s):

Social Media ◽

Language Processing ◽

Topic Model ◽

Central Area ◽

Classification Model ◽

Social Media Data ◽

Ring Road ◽

Different Types ◽

Spatial Differences ◽

Media Data

Social media data contains real-time expressed information, including text and geographical location. As a new data source for crowd behavior research in the era of big data, it can reflect some aspects of the behavior of residents. In this study, a text classification model based on the BERT and Transformers framework was constructed, which was used to classify and extract more than 210,000 residents’ festival activities based on the 1.13 million Sina Weibo (Chinese “Twitter”) data collected from Beijing in 2019 data. On this basis, word frequency statistics, part-of-speech analysis, topic model, sentiment analysis and other methods were used to perceive different types of festival activities and quantitatively analyze the spatial differences of different types of festivals. The results show that traditional culture significantly influences residents’ festivals, reflecting residents’ motivation to participate in festivals and how residents participate in festivals and express their emotions. There are apparent spatial differences among residents in participating in festival activities. The main festival activities are distributed in the central area within the Fifth Ring Road in Beijing. In contrast, expressing feelings during the festival is mainly distributed outside the Fifth Ring Road in Beijing. The research integrates natural language processing technology, topic model analysis, spatial statistical analysis, and other technologies. It can also broaden the application field of social media data, especially text data, which provides a new research paradigm for studying residents’ festival activities and adds residents’ perception of the festival. The research results provide a basis for the design and management of the Chinese festival system.

Download Full-text

Think Twice: A Post-Processing Approach for the Chinese Spelling Error Correction

Applied Sciences ◽

10.3390/app11135832 ◽

2021 ◽

Vol 11 (13) ◽

pp. 5832

Author(s):

Wei Gou ◽

Zheng Chen

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Error Correction ◽

Language Processing ◽

Classification Model ◽

Spelling Error ◽

Learning Method ◽

Post Processing ◽

Rule Based ◽

Model Training

Chinese Spelling Error Correction is a hot subject in the field of natural language processing. Researchers have already produced many great solutions, from the initial rule-based solution to the current deep learning method. At present, SpellGCN, proposed by Alibaba’s team, achieves the best results of which character level precision over SIGHAN2013 is 98.4%. However, when we apply this algorithm to practical error correction tasks, it produces many false error correction results. We believe that this is because the corpus used for model training contains significantly more errors than the text used for model correcting. In response to this problem, we propose performing a post-processing operation on the error correction tasks. We employ the initial model’s output as a candidate character, obtain various features of the character itself and its context, and then use a classification model to filter the initial model’s false error correction results. The post-processing idea introduced in this paper can apply to most Chinese Spelling Error Correction models to improve their performance over practical error correction tasks.

Download Full-text

Modelling Propagation of Public Opinions on Microblogging Big Data Using Sentiment Analysis and Compartmental Models

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2017010102 ◽

2017 ◽

Vol 13 (1) ◽

pp. 11-27 ◽

Cited By ~ 8

Author(s):

Youjia Fang ◽

Xin Chen ◽

Zheng Song ◽

Tianzi Wang ◽

Yang Cao

Keyword(s):

Social Media ◽

Big Data ◽

Sentiment Analysis ◽

Language Processing ◽

Information Diffusion ◽

Compartmental Model ◽

Model Fitting ◽

Compartmental Models ◽

Sina Weibo ◽

Public Opinions

Compartmental models have been used to model information diffusion on social media. However, there have been few studies on modelling positive and negative public opinions using compartmental models. This study aimed for using sentiment analysis and compartmental model to model the propagation of positive and negative opinions on microblogging big media. The authors studied the news propagation of seven popular social topics on China's Sina Weibo microblogging platform. Natural language processing and sentiment analysis were used to identify public opinions from microblogging big data. Then two existing (SIZ and SEIZ) models and a newly developed (SE2IZ) model were implemented to model the news propagation and evaluate the trends of public opinions on selected social topics. Simulation study was used to check model fitting performance. The results show that the new SE2IZ model has a better model fitting performance than existing models. This study sheds some new light on using social media for public opinion estimation and prediction.

Download Full-text

Emotional Branding of China’s State-Owned Enterprises on Sina Weibo

Journal of Critical Studies in Language and Literature ◽

10.46809/jcsll.v2i4.76 ◽

2021 ◽

Vol 2 (4) ◽

pp. 30-45

Author(s):

Ying Hua

Keyword(s):

Social Media ◽

National Economy ◽

Sina Weibo ◽

The Public ◽

Emotional Bonding ◽

Communicative Functions ◽

Chinese Characteristics ◽

Branding Strategies ◽

Chinese State ◽

Shed Light

In order to provide the interpersonal, rhetoric and semiotic insights for studying corporate emotional branding discourse on social media, this study attempts to target China’s state-owned enterprises which represent the pillars of national economy with Chinese characteristics and shed light on the discourse realizations of their emotional branding strategies from the textual and interpersonal perspectives. Specifically, the present study focuses on the two kinds of textual and interpersonal representations on China-based Sina Weibo: 1) the use of stylistic features; 2) the use of attitudinal appeals. A corpus of forty-day updates of the three giant Chinese state-owned enterprises on Sina Weibo is retrieved and analyzed quantitatively and qualitatively. The results suggest the prevalence of involving stylistic features, the proliferation of affect and judgment appeals and the hybridization of appreciation and affect/judgment, which posits interdiscursivity and intertextuality in communicative functions. China’s state-owned enterprises communicate to forge emotional bonding with the public other than promote their products. This pragmatic shift towards solidarity facework is indicative of a transcultural phenomenon elicited by digital globalization and the neoliberalist trend in China’s national economy.

Download Full-text

Toxic Comment Classification

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.36860 ◽

2021 ◽

Vol 9 (VII) ◽

pp. 2283-2289

Author(s):

Sonika Prakash

Keyword(s):

Neural Network ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Significant Proportion ◽

Classification Model ◽

User Comments ◽

Online Comments ◽

Different Types

A large proportion of online comments available on public domains are usually constructive. However, a significant proportion is toxic and destructive. Several platforms and social media sites are finding it difficult to maintain fair conversation and are often forced to either limit the user comments or get dissolved by shutting down user comments completely. So, to prevent these types of identity hate through comments on social media, we come up with a solution to detect different types of toxicity in the comments using Deep Learning and Natural Language Processing. Dataset is obtained online which is processed to remove noise. Transformation of raw comments is done before feeding it to the classification model using Natural Language Processing. A Convolutional Neural Network model is used which will differentiate toxic comments from non-toxic comments.

Download Full-text

Text Classification Models for the Automatic Detection of Nonmedical Prescription Medication Use from Social Media

10.1101/2020.04.13.20064089 ◽

2020 ◽

Author(s):

Ali Al-Garadi Mohammed ◽

Yuan-Chi Yang ◽

Haitao Cai ◽

Yucheng Ruan ◽

Karen O’Connor ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Language Processing ◽

Text Classification ◽

Prescription Medication ◽

The United States ◽

Language Models ◽

Classification Model ◽

Classification Models ◽

Active Monitoring

ABSTRACTPrescription medication (PM) misuse/abuse has emerged as a national crisis in the United States, and social media has been suggested as a potential resource for performing active monitoring. However, automating a social media-based monitoring system is challenging—requiring advanced natural language processing (NLP) and machine learning methods. In this paper, we describe the development and evaluation of automatic text classification models for detecting self-reports of PM abuse from Twitter. We experimented with state-of-the-art bi-directional transformer-based language models, which utilize tweet-level representations that enable transfer learning (e.g., BERT, RoBERTa, XLNet, AlBERT, and DistilBERT), proposed fusion-based approaches, and compared the developed models with several traditional machine learning, including deep learning, approaches. Using a public dataset, we evaluated the performances of the classifiers on their abilities to classify the non-majority “abuse/misuse” class. Our proposed fusion-based model performs significantly better than the best traditional model (F1-score [95% CI]: 0.67 [0.64-0.69] vs. 0.45 [0.42-0.48]). We illustrate, via experimentation using differing training set sizes, that the transformer-based models are more stable and require less annotated data compared to the other models. The significant improvements achieved by our best-performing classification model over past approaches makes it suitable for automated continuous monitoring of nonmedical PM use from Twitter.

Download Full-text

redBERT

International Journal of Open Source Software and Processes ◽

10.4018/ijossp.2021070103 ◽

2021 ◽

Vol 12 (3) ◽

pp. 32-47

Author(s):

Chaitanya Pandey

Keyword(s):

Decision Making ◽

Social Media ◽

Language Processing ◽

Computational Techniques ◽

Topic Modelling ◽

Automated Extraction ◽

Wide Scale ◽

Human Psyche ◽

Shed Light

A natural language processing (NLP) method was used to uncover various issues and sentiments surrounding COVID-19 from social media and get a deeper understanding of fluctuating public opinion in situations of wide-scale panic to guide improved decision making with the help of a sentiment analyser created for the automated extraction of COVID-19-related discussions based on topic modelling. Moreover, the BERT model was used for the sentiment classification of COVID-19 Reddit comments. These findings shed light on the importance of studying trends and using computational techniques to assess the human psyche in times of distress.

Download Full-text

redBERT: A Topic Discovery and Deep Sentiment Classification Model on COVID-19 Online Discussions Using BERT NLP Model

10.1101/2021.03.02.21252747 ◽

2021 ◽

Author(s):

Chaitanya Pandey

Keyword(s):

Language Processing ◽

Online Discussions ◽

Sentiment Classification ◽

Classification Model ◽

Computational Techniques ◽

Topic Modelling ◽

Wide Scale ◽

Human Psyche ◽

Shed Light

A Natural Language Processing (NLP) method was used to uncover various issues and sentiments surrounding COVID-19 from social media and get a deeper understanding of fluctuating public opinion in situations of wide-scale panic to guide improved decision making with the help of a sentiment analyser created for the automated extraction of COVID-19 related discussions based on topic modelling. Moreover, the BERT model was used for the sentiment classification of COVID-19 Reddit comments. These findings shed light on the importance of studying trends and using computational techniques to assess human psyche in times of distress.

Download Full-text