scholarly journals Towards Detecting Social Events by Mining Geographical Patterns with VGI Data

2018 ◽  
Vol 7 (12) ◽  
pp. 481
Author(s):  
Zhewei Liu ◽  
Xiaolin Zhou ◽  
Wenzhong Shi ◽  
Anshu Zhang

Detecting events using social media data is important for timely emergency response and urban monitoring. Current studies primarily use semantic-based methods, in which “bursts” of certain semantic signals are detected to identify emerging events. Nevertheless, our consideration is that a social event will not only affect semantic signals but also cause irregular human mobility patterns. By introducing depictive features, such irregular patterns can be used for event detection. Consequently, in this paper, we develop a novel, comprehensive workflow for event detection by mining the geographical patterns of VGI. This workflow first uses data geographical topic modeling to detect the hashtag communities with VGI semantic data. Both global and local indicators are then constructed by introducing spatial autocorrelation measurements. We then adopt an outlier test and generate indicator maps to spatiotemporally identify the potential social events. This workflow was implemented using a real-world dataset (104,000 geo-tagged photos) and the evaluation was conducted both qualitatively and quantitatively. A set of experiments showed that the discovered semantic communities were internally consistent and externally differentiable, and the plausibility of the detected events was demonstrated by referring to the available ground truth. This study examined the feasibility of detecting events by investigating the geographical patterns of social media data and can be applied to urban knowledge retrieval.

2020 ◽  
Vol 15 (1) ◽  
Author(s):  
Donal Bisanzio ◽  
Moritz U.G. Kraemer ◽  
Isaac I. Bogoch ◽  
Thomas Brewer ◽  
John S Brownstein ◽  
...  

As of February 27, 2020, 82,294 confirmed cases of coronavirus disease (COVID-19) have been reported since December 2019, including 2,804 deaths, with cases reported throughout China, as well as in 45 international locations outside of mainland China. We predict the spatiotemporal spread of reported COVID- 19 cases at the global level during the first few weeks of the current outbreak by analyzing openly available geolocated Twitter social media data. Human mobility patterns were estimated by analyzing geolocated 2013–2015 Twitter data from users who had: i) tweeted at least twice on consecutive days from Wuhan, China, between November 1, 2013, and January 28, 2014, and November 1, 2014, and January 28, 2015; and ii) left Wuhan following their second tweet during the time period under investigation. Publicly available COVID-19 case data were used to investigate the correlation among cases reported during the current outbreak, locations visited by the study cohort of Twitter users, and airports with scheduled flights from Wuhan. Infectious Disease Vulnerability Index (IDVI) data were obtained to identify the capacity of countries receiving travellers from Wuhan to respond to COVID-19. Our study cohort comprised 161 users. Of these users, 133 (82.6%) posted tweets from 157 Chinese cities (1,344 tweets) during the 30 days after leaving Wuhan following their second tweet, with a median of 2 (IQR= 1–3) locations visited and a mean distance of 601 km (IQR= 295.2–834.7 km) traveled. Of our user cohort, 60 (37.2%) traveled abroad to 119 locations in 28 countries. Of the 82 COVID-19 cases reported outside China as of January 30, 2020, 54 cases had known geolocation coordinates and 74.1% (40 cases) were reported less than 15 km (median = 7.4 km, IQR= 2.9–285.5 km) from a location visited by at least one of our study cohort’s users. Countries visited by the cohort’s users and which have cases reported by January 30, 2020, had a median IDVI equal to 0.74. We show that social media data can be used to predict the spatiotemporal spread of infectious diseases such as COVID-19. Based on our analyses, we anticipate cases to be reported in Saudi Arabia and Indonesia; additionally, countries with a moderate to low IDVI (i.e. ≤0.7) such as Indonesia, Pakistan, and Turkey should be on high alert and develop COVID- 19 response plans as soon as permitting.


2020 ◽  
Vol 9 (2) ◽  
pp. 125 ◽  
Author(s):  
Zeinab Ebrahimpour ◽  
Wanggen Wan ◽  
José Luis Velázquez García ◽  
Ofelia Cervantes ◽  
Li Hou

Social media data analytics is the art of extracting valuable hidden insights from vast amounts of semi-structured and unstructured social media data to enable informed and insightful decision-making. Analysis of social media data has been applied for discovering patterns that may support urban planning decisions in smart cities. In this paper, Weibo social media data are used to analyze social-geographic human mobility in the CBD area of Shanghai to track citizen’s behavior. Our main motivation is to test the validity of geo-located Weibo data as a source for discovering human mobility and activity patterns. In addition, our goal is to identify important locations in people’s lives with the support of location-based services. The algorithms used are described and the results produced are presented using adequate visualization techniques to illustrate the detected human mobility patterns obtained by the large-scale social media data in order to support smart city planning decisions. The outcome of this research is helpful not only for city planners, but also for business developers who hope to extend their services to citizens.


2018 ◽  
Vol 64 (2) ◽  
pp. 221-238 ◽  
Author(s):  
Chao Yang ◽  
Meng Xiao ◽  
Xuan Ding ◽  
Wenwen Tian ◽  
Yong Zhai ◽  
...  

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Yasmeen George ◽  
Shanika Karunasekera ◽  
Aaron Harwood ◽  
Kwan Hui Lim

AbstractA key challenge in mining social media data streams is to identify events which are actively discussed by a group of people in a specific local or global area. Such events are useful for early warning for accident, protest, election or breaking news. However, neither the list of events nor the resolution of both event time and space is fixed or known beforehand. In this work, we propose an online spatio-temporal event detection system using social media that is able to detect events at different time and space resolutions. First, to address the challenge related to the unknown spatial resolution of events, a quad-tree method is exploited in order to split the geographical space into multiscale regions based on the density of social media data. Then, a statistical unsupervised approach is performed that involves Poisson distribution and a smoothing method for highlighting regions with unexpected density of social posts. Further, event duration is precisely estimated by merging events happening in the same region at consecutive time intervals. A post processing stage is introduced to filter out events that are spam, fake or wrong. Finally, we incorporate simple semantics by using social media entities to assess the integrity, and accuracy of detected events. The proposed method is evaluated using different social media datasets: Twitter and Flickr for different cities: Melbourne, London, Paris and New York. To verify the effectiveness of the proposed method, we compare our results with two baseline algorithms based on fixed split of geographical space and clustering method. For performance evaluation, we manually compute recall and precision. We also propose a new quality measure named strength index, which automatically measures how accurate the reported event is.


2021 ◽  
Author(s):  
Hansi Hettiarachchi ◽  
Mariam Adedoyin-Olowe ◽  
Jagdev Bhogal ◽  
Mohamed Medhat Gaber

AbstractSocial media is becoming a primary medium to discuss what is happening around the world. Therefore, the data generated by social media platforms contain rich information which describes the ongoing events. Further, the timeliness associated with these data is capable of facilitating immediate insights. However, considering the dynamic nature and high volume of data production in social media data streams, it is impractical to filter the events manually and therefore, automated event detection mechanisms are invaluable to the community. Apart from a few notable exceptions, most previous research on automated event detection have focused only on statistical and syntactical features in data and lacked the involvement of underlying semantics which are important for effective information retrieval from text since they represent the connections between words and their meanings. In this paper, we propose a novel method termed Embed2Detect for event detection in social media by combining the characteristics in word embeddings and hierarchical agglomerative clustering. The adoption of word embeddings gives Embed2Detect the capability to incorporate powerful semantical features into event detection and overcome a major limitation inherent in previous approaches. We experimented our method on two recent real social media data sets which represent the sports and political domain and also compared the results to several state-of-the-art methods. The obtained results show that Embed2Detect is capable of effective and efficient event detection and it outperforms the recent event detection methods. For the sports data set, Embed2Detect achieved 27% higher F-measure than the best-performed baseline and for the political data set, it was an increase of 29%.


2021 ◽  
Author(s):  
Shishuo Xu

<div>Small-scale events involve interactive human movement in limited space and time. Social media platforms possibly generate large amount of geospatially-referenced information related to small-scale events. It benefits individuals, management departments, and urban systems if small-scale events can be timely detected from social media platforms, where measuring the abnormal patterns of human movement to discover events and analyzing associated texts to interpret the reasons behind abnormal movement are two keys. Through investigating how people move as different events occur and measuring the patterns on social media platforms, small-scale events can be generally classified into two types, namely type I events with abrupt patterns and type II events with random occurrence of key factors, where social events and traffic events are representative correspondingly.</div><div>Despite many studies have been conducted to detect social events and traffic events using geosocial media data, there still are some un-answered questions requiring further research. Most existing studies did not identify occurring events from a full coverage of spatial, temporal, and semantic perspectives. Studies concerning social event detection lack efficient semantic analysis summarizing event content to infer the reasons driving the abnormal movement. The typical classification-based method regarding traffic event detection lacks investigation on how the spatiotemporal distribution of traffic relevant posts associate with the occurring traffic events, and simply assigns the detected events with predefined categories, missing events that indicate traffic anomalies but go beyond the predetermined categories.<br></div><div>In this thesis, spatial-temporal-semantic approaches are proposed to measure spatiotemporal patterns of posts and users of social media platforms to capture abnormal human movement, and analyze the content of associated posts to mine the reasons driving the movement. A variety of techniques including machine learning, natural language processing, and spatiotemporal analysis are adopted to realize effective detection. Based on one-year Twitter data collected in Toronto, 2014 Toronto International Film Festival and traffic anomaly detection are selected as two case studies to evaluate the performance of proposed approaches. Through comparing with the ground truth data, the result reveals that more than 80% of the detected events do refer to real-world events, which illustrates the feasibility and efficiency of proposed approaches.<br></div><div><br></div><div>Keywords: Small-scale event, Event detection, Geosocial media data, Traffic event, Social event, Twitter, Spatiotemporal clustering<br></div>


10.2196/14986 ◽  
2020 ◽  
Vol 6 (2) ◽  
pp. e14986 ◽  
Author(s):  
Ashlynn R Daughton ◽  
Rumi Chunara ◽  
Michael J Paul

Background Internet data can be used to improve infectious disease models. However, the representativeness and individual-level validity of internet-derived measures are largely unexplored as this requires ground truth data for study. Objective This study sought to identify relationships between Web-based behaviors and/or conversation topics and health status using a ground truth, survey-based dataset. Methods This study leveraged a unique dataset of self-reported surveys, microbiological laboratory tests, and social media data from the same individuals toward understanding the validity of individual-level constructs pertaining to influenza-like illness in social media data. Logistic regression models were used to identify illness in Twitter posts using user posting behaviors and topic model features extracted from users’ tweets. Results Of 396 original study participants, only 81 met the inclusion criteria for this study. Of these participants’ tweets, we identified only two instances that were related to health and occurred within 2 weeks (before or after) of a survey indicating symptoms. It was not possible to predict when participants reported symptoms using features derived from topic models (area under the curve [AUC]=0.51; P=.38), though it was possible using behavior features, albeit with a very small effect size (AUC=0.53; P≤.001). Individual symptoms were also generally not predictable either. The study sample and a random sample from Twitter are predictably different on held-out data (AUC=0.67; P≤.001), meaning that the content posted by people who participated in this study was predictably different from that posted by random Twitter users. Individuals in the random sample and the GoViral sample used Twitter with similar frequencies (similar @ mentions, number of tweets, and number of retweets; AUC=0.50; P=.19). Conclusions To our knowledge, this is the first instance of an attempt to use a ground truth dataset to validate infectious disease observations in social media data. The lack of signal, the lack of predictability among behaviors or topics, and the demonstrated volunteer bias in the study population are important findings for the large and growing body of disease surveillance using internet-sourced data.


Sign in / Sign up

Export Citation Format

Share Document