A Novel Framework for Mining Social Media Data Based on Text Mining, Topic Modeling, Random Forest, and DANP Methods

Chi-Yo Huang; Chia-Lee Yang; Yi-Hao Hsiao

doi:10.3390/math9172041

A Novel Framework for Mining Social Media Data Based on Text Mining, Topic Modeling, Random Forest, and DANP Methods

Mathematics ◽

10.3390/math9172041 ◽

2021 ◽

Vol 9 (17) ◽

pp. 2041

Author(s):

Chi-Yo Huang ◽

Chia-Lee Yang ◽

Yi-Hao Hsiao

Keyword(s):

Decision Making ◽

Social Media ◽

Random Forest ◽

Social Theory ◽

Latent Dirichlet Allocation ◽

Theoretical Framework ◽

Causal Relationships ◽

Social Media Data ◽

Feature Importance ◽

Media Data

The huge volume of user-generated data on social media is the result of the aggregation of users’ personal backgrounds, past experiences, and daily activities. This huge size of the generated data, the so-called “big data,” has been studied and investigated intensively during the past few years. In spite of the impression one may get from the media, a great deal of data processing has not been uncovered by existing techniques of data engineering and processing. However, very few scholars have tried to do so, especially from the perspective of multiple-criteria decision-making (MCDM). These MCDM methods can derive influence relationships and weights associated with aspects and criteria, which can hardly be achieved by traditional data analytics and statistical approaches. Therefore, in this paper, we aim to propose an analytic framework to mine social networks, feed the meaningful information via MCDM methods based on a theoretical framework, derive causal relationships among the aspects of the theoretical framework, and finally compare the causal relationships with a social theory. Latent Dirichlet allocation (LDA) will be adopted to derive topic models based on the data retrieved from social media. By clustering the topics into aspects of the social theory, the probability associated with each aspect will be normalized and then transformed to a Likert-type 5-point scale. Afterwards, for every topic, the feature importance of all other topics will be derived using the random forest (RF) algorithm. The feature importance matrix will be transformed to the initial influence matrix of the decision-making trial and evaluation laboratory (DEMATEL). The influence relationships among the aspects and criteria and influence weights can then be derived by using the DEMATEL-based analytic network process (DANP). The influence weight versus each criterion can be derived by using DANP. To verify the feasibility of the proposed framework, Taiwanese users’ attitudes toward air pollution will be analyzed based on the value–belief–norm (VBN) theory by using social media data retrieved from Dcard (dcard.tw). Based on the analytic results, the causal relationships are fully consistent with the VBN framework. Further, the mutual influences derived in this work that were seldom discussed by earlier works, i.e., the mutual influences between altruistic concerns and egoistic concerns, as well as those between altruistic concerns and biosphere concerns, are worth further investigation in future.

Download Full-text

Promoting corporate sustainability through sustainable resource management: A hybrid decision-making approach incorporating social media data

Environmental Impact Assessment Review ◽

10.1016/j.eiar.2020.106459 ◽

2020 ◽

Vol 85 ◽

pp. 106459

Author(s):

Li Xia ◽

Jiuchang Wei ◽

Shuo Gao ◽

Ben Ma

Keyword(s):

Decision Making ◽

Social Media ◽

Resource Management ◽

Corporate Sustainability ◽

Social Media Data ◽

Sustainable Resource Management ◽

Media Data

Download Full-text

The Role of Smart Technologies to Support Citizen Engagement and Decision Making

International Journal of Electronic Government Research ◽

10.4018/ijegr.2018100101 ◽

2018 ◽

Vol 14 (4) ◽

pp. 1-17 ◽

Cited By ~ 1

Author(s):

Gabriela Viale Pereira ◽

Gregor Eibl ◽

Constantinos Stylianou ◽

Gilberto Martínez ◽

Haris Neophytou ◽

...

Keyword(s):

Decision Making ◽

Social Media ◽

Real Life ◽

Digital Technologies ◽

Fuzzy Cognitive Maps ◽

Data Driven ◽

Social Media Data ◽

Life Problems ◽

Smart Technologies ◽

Media Data

Smart government relies both on the application of digital technologies to enable citizen's participation in order to achieve a high level of citizen centricity and on data-driven decision making in order to improve the quality of life of citizens. Data-driven decisions in turn depend on accessible and reliable datasets, which open government and social media data are likely to promise. The SmartGov project uses digital technologies by integrating open and social media data in Fuzzy Cognitive Maps to model real life problems and simulate different scenarios leading to better decision making. This research performed a multiple-case analysis in two pilot cities. Both municipalities use the technologies to find the best routes: Limassol to improve the garbage collection and Quart de Poblet to improve the walking routes of chaperones guiding children to school. The article proposes a generic framework for Smart City Governance focusing on the inputs and outcomes of this process in the use of technologies for policy making built based on the analysis of the SmartGov.

Download Full-text

Causal Relationships Among Pollen Counts, Tweet Numbers, and Patient Numbers for Seasonal Allergic Rhinitis Surveillance: Retrospective Analysis (Preprint)

10.2196/preprints.10450 ◽

2018 ◽

Author(s):

Shoko Wakamiya ◽

Shoji Matsune ◽

Kimihiro Okubo ◽

Eiji Aramaki

Keyword(s):

Social Media ◽

Allergic Rhinitis ◽

Seasonal Allergic Rhinitis ◽

Japanese Cedar ◽

Pollen Count ◽

Causal Relationships ◽

Social Media Data ◽

Pollen Counts ◽

Number Of Patients ◽

Media Data

BACKGROUND Health-related social media data are increasingly used in disease-surveillance studies, which have demonstrated moderately high correlations between the number of social media posts and the number of patients. However, there is a need to understand the causal relationship between the behavior of social media users and the actual number of patients in order to increase the credibility of disease surveillance based on social media data. OBJECTIVE This study aimed to clarify the causal relationships among pollen count, the posting behavior of social media users, and the number of patients with seasonal allergic rhinitis in the real world. METHODS This analysis was conducted using datasets of pollen counts, tweet numbers, and numbers of patients with seasonal allergic rhinitis from Kanagawa Prefecture, Japan. We examined daily pollen counts for Japanese cedar (the major cause of seasonal allergic rhinitis in Japan) and hinoki cypress (which commonly complicates seasonal allergic rhinitis) from February 1 to May 31, 2017. The daily numbers of tweets that included the keyword “kafunshō” (or seasonal allergic rhinitis) were calculated between January 1 and May 31, 2017. Daily numbers of patients with seasonal allergic rhinitis from January 1 to May 31, 2017, were obtained from three healthcare institutes that participated in the study. The Granger causality test was used to examine the causal relationships among pollen count, tweet numbers, and the number of patients with seasonal allergic rhinitis from February to May 2017. To determine if time-variant factors affect these causal relationships, we analyzed the main seasonal allergic rhinitis phase (February to April) when Japanese cedar trees actively produce and release pollen. RESULTS Increases in pollen count were found to increase the number of tweets during the overall study period (P=.04), but not the main seasonal allergic rhinitis phase (P=.05). In contrast, increases in pollen count were found to increase patient numbers in both the study period (P=.04) and the main seasonal allergic rhinitis phase (P=.01). Increases in the number of tweets increased the patient numbers during the main seasonal allergic rhinitis phase (P=.02), but not the overall study period (P=.89). Patient numbers did not affect the number of tweets in both the overall study period (P=.24) and the main seasonal allergic rhinitis phase (P=.47). CONCLUSIONS Understanding the causal relationships among pollen counts, tweet numbers, and numbers of patients with seasonal allergic rhinitis is an important step to increasing the credibility of surveillance systems that use social media data. Further in-depth studies are needed to identify the determinants of social media posts described in this exploratory analysis.

Download Full-text

The Role of Smart Technologies to Support Citizen Engagement and Decision Making

10.4018/978-1-6684-3706-3.ch036 ◽

2022 ◽

pp. 687-703

Author(s):

Gabriela Viale Pereira ◽

Gregor Eibl ◽

Constantinos Stylianou ◽

Gilberto Martínez ◽

Haris Neophytou ◽

...

Keyword(s):

Decision Making ◽

Social Media ◽

Real Life ◽

Digital Technologies ◽

Data Driven ◽

Social Media Data ◽

Life Problems ◽

Smart Technologies ◽

High Level ◽

Media Data

Download Full-text

Tracking via Geotagged Social Media Data

Tracking Tourists ◽

10.23912/9781911635383-4575 ◽

2020 ◽

Author(s):

Anne Hardy

Keyword(s):

Decision Making ◽

Social Media ◽

Single Point ◽

Application Programming Interface ◽

Continuous Location ◽

Social Media Data ◽

Large Numbers ◽

Application Programming ◽

Media Data ◽

Programming Interface

Over the past twenty years, social media has changed the ways in which we plan, travel and reflect on our travels. Tourists use social media while travelling to stay in touch with friends and family, enhance their social status (Guo et al., 2015); and assist others with decision making (Xiang and Gretzel, 2010; Yoo and Gretzel, 2010). They also use it to report back to their friends and family where they are. This can be done using a geotag function that provides a location for where a post is made. While little is known about why tourists choose to geotag their social media posts, Chung and Lee (2016) suggest that geotags may be used in an altruistic manner by tourists, in order to provide information, and because they elicit a sense of anticipated reward. What is known, however, is that the function offers researchers the ability to understand where tourists travel. There are two types of geotagged social media data. The first of these is discussed in this chapter and may be defined as single point geo-referenced data – geotagged social media posts whose release is chosen by the user. This includes data gathered from social media apps such as Facebook, Instagram, Twitter and WeiChat. The method of obtaining this data involves the collation of large numbers of discrete geotagged updates or photographs. Data can be collated via an application programming interface (API) provided by the app developer to researchers, by automated data scraping via computer programs, perhaps written in Python, or manually by researchers. The second type of data is continuous location-based data from applications that are designed to track movement constantly, such as Strava or MyFitnessPal. Tracking methods using this continuous location-based data are discussed in detail in the following chapter.

Download Full-text

Topic modeling to mind illegal compensation for occupational injuries

European Journal of Public Health ◽

10.1093/eurpub/ckz186.317 ◽

2019 ◽

Vol 29 (Supplement_4) ◽

Author(s):

S H Song ◽

J Y Min ◽

H J Kim ◽

K B Min

Keyword(s):

Social Media ◽

Topic Modeling ◽

Social Insurance ◽

Latent Dirichlet Allocation ◽

Occupational Injuries ◽

Workplace Safety ◽

Body Parts ◽

Insurance Claims ◽

Social Media Data ◽

Media Data

Abstract Background Accurate reports of occupational injuries are important to monitor workplace safety and health initiatives. In South Korea, media reports, experts, and workers have been constantly raising the issue of underreporting. Supposedly it is because employers have strong market “incentives” by underreporting their employees’ injuries. A critical way to underreport or cover-up is illegal compensation (in Korean called “gong-sang”). Unfortunately, “gong-sang” is not counted as official occupational injury statistics. The aim of this study was to analyze the social media data using topic modeling and to explore issues surrounding “gong-sang”. Methods We used web scraping technology and collected 2,210 social media data from Web search engines. Data was processed to transform unstructured textual documents into structured data using the Python and applied Latent Dirichlet allocation (LDA) in the Python library, Gensim, for topic modeling. Results Based on the LDA method from “gong-sang”- related documentation, 10 topics were identified. Topic 1 was the greatest concern (60.5%), with keywords implying the choice between illegal compensation (“gong-sang”) and legal insurance claims. The next concern was Topic 2 including keywords associated with claims for industrial accident insurance benefits. The rest topics (topic 3-10) showed the monetary issue, precarious employment, and vulnerable body parts to “gong-sang”. Conclusions We explored web-based data and identified the salient issues surrounding “gong-sang”. LDA topics may be helpful to ensure efficient occupational health and safety scheme to protect vulnerable employees from “gong-sang” practices. Key messages The topics formulated by LDA included queries about legal insurance claims. Legal insurance claims including private or social insurance, monetary compensation, injured body parts, and the type of jobs vulnerable to “gong-sang”.

Download Full-text

Real-Time Recommendation Engine for Readers

Advances in Library and Information Science - Big Data Applications for Improving Library Services ◽

10.4018/978-1-7998-3049-8.ch011 ◽

2021 ◽

pp. 165-177

Author(s):

Sangeeta Namdev Dhamdhere ◽

Deepak Mane

Keyword(s):

Decision Making ◽

Social Media ◽

Real Time ◽

Recommendation System ◽

The Internet ◽

Problem Analysis ◽

Social Media Data ◽

Good Book ◽

Data Ingestion ◽

Media Data

In today's world, every reader or social media user has different choices/hobbies in terms of reading. For example, if any social media user is searching for a book to read without any specific idea of what s/he wants, s/he wastes a lot of time browsing around on the internet and crawling/trawling through various sites hoping that s/he might get good book. To avoid confusion, the authors are building a recommendation system for every reader/user that helps to recommend a book based on his choices, hobbies, or what s/he had read previously that will be massive help for users instead wasting time on various sites. Data from social media is the powerful fuel that can be used to helps in decision making and building a recommendation engine. Social media data in the different format is biggest challenge for the business to ingest data at the reasonable speed and further process. In social media data, it is difficult to detect and capture data. Real-time recommendation engine for users, which includes data ingestion methods, challenges, metadata problem, analysis, and consumption, is discussed here.

Download Full-text

A Sentiment Analysis-based Expert Weight Determination Method for Large-scale Group Decision-making Driven by Social Media Data

Expert Systems with Applications ◽

10.1016/j.eswa.2021.115629 ◽

2021 ◽

pp. 115629

Author(s):

Qifeng Wan ◽

Xuanhua Xu ◽

Jun Zhuang ◽

Bin Pan

Keyword(s):

Decision Making ◽

Social Media ◽

Sentiment Analysis ◽

Group Decision Making ◽

Large Scale ◽

Group Decision ◽

Determination Method ◽

Social Media Data ◽

Media Data

Download Full-text

A new emergency management dynamic value assessment model based on social media data: a multiphase decision-making perspective

Enterprise Information Systems ◽

10.1080/17517575.2020.1722251 ◽

2020 ◽

Vol 14 (5) ◽

pp. 680-709 ◽

Cited By ~ 7

Author(s):

Siqing Shan ◽

Xiaohui Liu ◽

Yigang Wei ◽

Lida Xu ◽

Baishang Zhang ◽

...

Keyword(s):

Decision Making ◽

Social Media ◽

Emergency Management ◽

Assessment Model ◽

Value Assessment ◽

Social Media Data ◽

Model Based ◽

Media Data

Download Full-text

Social media data analytics for business decision making system to competitive analysis

Information Processing & Management ◽

10.1016/j.ipm.2021.102751 ◽

2022 ◽

Vol 59 (1) ◽

pp. 102751

Author(s):

Jie Yang ◽

Pishi Xiu ◽

Lipeng Sun ◽

Limeng Ying ◽

Blaanand Muthu

Keyword(s):

Decision Making ◽

Social Media ◽

Competitive Analysis ◽

Data Analytics ◽

Business Decision ◽

Social Media Data ◽

Decision Making System ◽

Media Data

Download Full-text