Few Tweets After Flu Peak: Twitter-based Influenza Detection by Referring Indirect Information (Preprint)
UNSTRUCTURED Background & Objective: The recent rise in popularity and scale of social networking services (SNSs) has resulted in an increasing need for SNS-based information extraction systems. A popular application of SNS data is health surveillance for predicting an outbreak of epidemics by detecting diseases from text messages posted on SNS platforms. Such applications share a following logic: they incorporate SNS users as social sensors. These social sensors-based approaches also share a common problem: SNS-based surveillances are much more reliable if sufficient numbers of users are active, and small or inactive populations produce inconsistent results. This paper proposes a novel approach that overcomes this problem using indirect information covering both urban areas and rural areas within the posts. Methods: To estimate the trend of the patient number in each area and each season, we present a TRAP model by embedding both direct information and indirect information. A collection of tweets spanning three years (seven million influenza-related tweets in Japanese) is used to evaluate the model. Both direct information and indirect information that mentions other places were used. As indirect information is less reliable (too noisy or too old) than direct information, the indirect information data were not utilized directly but were considered as inhibiting direct information. For example, when indirect information appeared often, it was considered as signifying that everyone already had a known disease, leading to a small amount of direct information. Results & Conclusions: The results evaluated using correlation coefficient revealed that the baseline model (BASELINE+NLP) shows 0.36, and the proposed model (TRAP+NLP) improved the accuracy (0.70, +0.34 points).