scholarly journals A Novel Framework for Mining Social Media Data Based on Text Mining, Topic Modeling, Random Forest, and DANP Methods

Mathematics ◽  
2021 ◽  
Vol 9 (17) ◽  
pp. 2041
Author(s):  
Chi-Yo Huang ◽  
Chia-Lee Yang ◽  
Yi-Hao Hsiao

The huge volume of user-generated data on social media is the result of the aggregation of users’ personal backgrounds, past experiences, and daily activities. This huge size of the generated data, the so-called “big data,” has been studied and investigated intensively during the past few years. In spite of the impression one may get from the media, a great deal of data processing has not been uncovered by existing techniques of data engineering and processing. However, very few scholars have tried to do so, especially from the perspective of multiple-criteria decision-making (MCDM). These MCDM methods can derive influence relationships and weights associated with aspects and criteria, which can hardly be achieved by traditional data analytics and statistical approaches. Therefore, in this paper, we aim to propose an analytic framework to mine social networks, feed the meaningful information via MCDM methods based on a theoretical framework, derive causal relationships among the aspects of the theoretical framework, and finally compare the causal relationships with a social theory. Latent Dirichlet allocation (LDA) will be adopted to derive topic models based on the data retrieved from social media. By clustering the topics into aspects of the social theory, the probability associated with each aspect will be normalized and then transformed to a Likert-type 5-point scale. Afterwards, for every topic, the feature importance of all other topics will be derived using the random forest (RF) algorithm. The feature importance matrix will be transformed to the initial influence matrix of the decision-making trial and evaluation laboratory (DEMATEL). The influence relationships among the aspects and criteria and influence weights can then be derived by using the DEMATEL-based analytic network process (DANP). The influence weight versus each criterion can be derived by using DANP. To verify the feasibility of the proposed framework, Taiwanese users’ attitudes toward air pollution will be analyzed based on the value–belief–norm (VBN) theory by using social media data retrieved from Dcard (dcard.tw). Based on the analytic results, the causal relationships are fully consistent with the VBN framework. Further, the mutual influences derived in this work that were seldom discussed by earlier works, i.e., the mutual influences between altruistic concerns and egoistic concerns, as well as those between altruistic concerns and biosphere concerns, are worth further investigation in future.

2018 ◽  
Vol 14 (4) ◽  
pp. 1-17 ◽  
Author(s):  
Gabriela Viale Pereira ◽  
Gregor Eibl ◽  
Constantinos Stylianou ◽  
Gilberto Martínez ◽  
Haris Neophytou ◽  
...  

Smart government relies both on the application of digital technologies to enable citizen's participation in order to achieve a high level of citizen centricity and on data-driven decision making in order to improve the quality of life of citizens. Data-driven decisions in turn depend on accessible and reliable datasets, which open government and social media data are likely to promise. The SmartGov project uses digital technologies by integrating open and social media data in Fuzzy Cognitive Maps to model real life problems and simulate different scenarios leading to better decision making. This research performed a multiple-case analysis in two pilot cities. Both municipalities use the technologies to find the best routes: Limassol to improve the garbage collection and Quart de Poblet to improve the walking routes of chaperones guiding children to school. The article proposes a generic framework for Smart City Governance focusing on the inputs and outcomes of this process in the use of technologies for policy making built based on the analysis of the SmartGov.


2018 ◽  
Author(s):  
Shoko Wakamiya ◽  
Shoji Matsune ◽  
Kimihiro Okubo ◽  
Eiji Aramaki

BACKGROUND Health-related social media data are increasingly used in disease-surveillance studies, which have demonstrated moderately high correlations between the number of social media posts and the number of patients. However, there is a need to understand the causal relationship between the behavior of social media users and the actual number of patients in order to increase the credibility of disease surveillance based on social media data. OBJECTIVE This study aimed to clarify the causal relationships among pollen count, the posting behavior of social media users, and the number of patients with seasonal allergic rhinitis in the real world. METHODS This analysis was conducted using datasets of pollen counts, tweet numbers, and numbers of patients with seasonal allergic rhinitis from Kanagawa Prefecture, Japan. We examined daily pollen counts for Japanese cedar (the major cause of seasonal allergic rhinitis in Japan) and hinoki cypress (which commonly complicates seasonal allergic rhinitis) from February 1 to May 31, 2017. The daily numbers of tweets that included the keyword “kafunshō” (or seasonal allergic rhinitis) were calculated between January 1 and May 31, 2017. Daily numbers of patients with seasonal allergic rhinitis from January 1 to May 31, 2017, were obtained from three healthcare institutes that participated in the study. The Granger causality test was used to examine the causal relationships among pollen count, tweet numbers, and the number of patients with seasonal allergic rhinitis from February to May 2017. To determine if time-variant factors affect these causal relationships, we analyzed the main seasonal allergic rhinitis phase (February to April) when Japanese cedar trees actively produce and release pollen. RESULTS Increases in pollen count were found to increase the number of tweets during the overall study period (P=.04), but not the main seasonal allergic rhinitis phase (P=.05). In contrast, increases in pollen count were found to increase patient numbers in both the study period (P=.04) and the main seasonal allergic rhinitis phase (P=.01). Increases in the number of tweets increased the patient numbers during the main seasonal allergic rhinitis phase (P=.02), but not the overall study period (P=.89). Patient numbers did not affect the number of tweets in both the overall study period (P=.24) and the main seasonal allergic rhinitis phase (P=.47). CONCLUSIONS Understanding the causal relationships among pollen counts, tweet numbers, and numbers of patients with seasonal allergic rhinitis is an important step to increasing the credibility of surveillance systems that use social media data. Further in-depth studies are needed to identify the determinants of social media posts described in this exploratory analysis.


2022 ◽  
pp. 687-703
Author(s):  
Gabriela Viale Pereira ◽  
Gregor Eibl ◽  
Constantinos Stylianou ◽  
Gilberto Martínez ◽  
Haris Neophytou ◽  
...  

Smart government relies both on the application of digital technologies to enable citizen's participation in order to achieve a high level of citizen centricity and on data-driven decision making in order to improve the quality of life of citizens. Data-driven decisions in turn depend on accessible and reliable datasets, which open government and social media data are likely to promise. The SmartGov project uses digital technologies by integrating open and social media data in Fuzzy Cognitive Maps to model real life problems and simulate different scenarios leading to better decision making. This research performed a multiple-case analysis in two pilot cities. Both municipalities use the technologies to find the best routes: Limassol to improve the garbage collection and Quart de Poblet to improve the walking routes of chaperones guiding children to school. The article proposes a generic framework for Smart City Governance focusing on the inputs and outcomes of this process in the use of technologies for policy making built based on the analysis of the SmartGov.


Author(s):  
Anne Hardy

Over the past twenty years, social media has changed the ways in which we plan, travel and reflect on our travels. Tourists use social media while travelling to stay in touch with friends and family, enhance their social status (Guo et al., 2015); and assist others with decision making (Xiang and Gretzel, 2010; Yoo and Gretzel, 2010). They also use it to report back to their friends and family where they are. This can be done using a geotag function that provides a location for where a post is made. While little is known about why tourists choose to geotag their social media posts, Chung and Lee (2016) suggest that geotags may be used in an altruistic manner by tourists, in order to provide information, and because they elicit a sense of anticipated reward. What is known, however, is that the function offers researchers the ability to understand where tourists travel. There are two types of geotagged social media data. The first of these is discussed in this chapter and may be defined as single point geo-referenced data – geotagged social media posts whose release is chosen by the user. This includes data gathered from social media apps such as Facebook, Instagram, Twitter and WeiChat. The method of obtaining this data involves the collation of large numbers of discrete geotagged updates or photographs. Data can be collated via an application programming interface (API) provided by the app developer to researchers, by automated data scraping via computer programs, perhaps written in Python, or manually by researchers. The second type of data is continuous location-based data from applications that are designed to track movement constantly, such as Strava or MyFitnessPal. Tracking methods using this continuous location-based data are discussed in detail in the following chapter.


2019 ◽  
Vol 29 (Supplement_4) ◽  
Author(s):  
S H Song ◽  
J Y Min ◽  
H J Kim ◽  
K B Min

Abstract Background Accurate reports of occupational injuries are important to monitor workplace safety and health initiatives. In South Korea, media reports, experts, and workers have been constantly raising the issue of underreporting. Supposedly it is because employers have strong market “incentives” by underreporting their employees’ injuries. A critical way to underreport or cover-up is illegal compensation (in Korean called “gong-sang”). Unfortunately, “gong-sang” is not counted as official occupational injury statistics. The aim of this study was to analyze the social media data using topic modeling and to explore issues surrounding “gong-sang”. Methods We used web scraping technology and collected 2,210 social media data from Web search engines. Data was processed to transform unstructured textual documents into structured data using the Python and applied Latent Dirichlet allocation (LDA) in the Python library, Gensim, for topic modeling. Results Based on the LDA method from “gong-sang”- related documentation, 10 topics were identified. Topic 1 was the greatest concern (60.5%), with keywords implying the choice between illegal compensation (“gong-sang”) and legal insurance claims. The next concern was Topic 2 including keywords associated with claims for industrial accident insurance benefits. The rest topics (topic 3-10) showed the monetary issue, precarious employment, and vulnerable body parts to “gong-sang”. Conclusions We explored web-based data and identified the salient issues surrounding “gong-sang”. LDA topics may be helpful to ensure efficient occupational health and safety scheme to protect vulnerable employees from “gong-sang” practices. Key messages The topics formulated by LDA included queries about legal insurance claims. Legal insurance claims including private or social insurance, monetary compensation, injured body parts, and the type of jobs vulnerable to “gong-sang”.


Author(s):  
Sangeeta Namdev Dhamdhere ◽  
Deepak Mane

In today's world, every reader or social media user has different choices/hobbies in terms of reading. For example, if any social media user is searching for a book to read without any specific idea of what s/he wants, s/he wastes a lot of time browsing around on the internet and crawling/trawling through various sites hoping that s/he might get good book. To avoid confusion, the authors are building a recommendation system for every reader/user that helps to recommend a book based on his choices, hobbies, or what s/he had read previously that will be massive help for users instead wasting time on various sites. Data from social media is the powerful fuel that can be used to helps in decision making and building a recommendation engine. Social media data in the different format is biggest challenge for the business to ingest data at the reasonable speed and further process. In social media data, it is difficult to detect and capture data. Real-time recommendation engine for users, which includes data ingestion methods, challenges, metadata problem, analysis, and consumption, is discussed here.


Sign in / Sign up

Export Citation Format

Share Document