Quantifying the relationship between diseases and symptoms using big data (Preprint)
BACKGROUND Crises in endemic transmitted diseases affect humans worldwide, and the symptoms these diseases cause may provide firsthand information about these disorders. OBJECTIVE We suggest that massive new data sources resulting from human interaction with the Internet may offer a unique perspective on the relationship between illness and symptoms. METHODS By analyzing changes in Google query volumes for search terms related to disease, we find a pattern that may define the relationship between symptoms and disorders. We first retrieved pattern data from Google Trend using the common cold as the primary disease, and sore throat, stuffy nose, sneeze, fever, cough, and headache as symptoms. Pearson’s correlation coefficient was calculated using SPSS to determine the relationship between the symptoms and the disease. RESULTS Data created since 2013/1/13 was retrieved from Google Trend on a weekly basis. A total of 261 sets of data were calculated to create a high correlation coefficient of 0.925 between the common cold and the stuffy nose symptom. The cough symptom has the second highest correlation coefficient of 0.925, sore throat has a correlation coefficient of 0.853, and fever has a correlation coefficient of 0.626, which was significant at the 0.01 level in a two-tailed test. CONCLUSIONS Data on the relationship between diseases and symptoms often comes from facilities such as government, hospitals, and clinics, where the data is collected through the documentation of physicians and nurses. A conventional study can be limited by the region, the number of patients and the interpretation of the specialist. However, with access to Google Trend’s big data, millions or even billions of data points are accumulated directly from the patient. Another contribution of this study is that the quantified relationship between symptoms and diseases can be used to educate future physicians or even artificial intelligence.