Prediction of norovirus using Internet search data in the US (Preprint)

Mapping Intimacies ◽

10.2196/preprints.24554 ◽

2020 ◽

Author(s):

Kai Yuan ◽

Guangrui Huang ◽

Haixu Jiang ◽

Wenbin Liu ◽

Ting Wang ◽

...

Keyword(s):

New York ◽

Disease Outbreaks ◽

Temporal Correlation ◽

Google Trends ◽

Watery Diarrhea ◽

Norwalk Virus ◽

Internet Search ◽

Search Terms ◽

Cross Correlation Analysis ◽

Google Search

BACKGROUND Norovirus is a contagious disease leading to vomiting and diarrhea. The transmission of norovirus spreads quickly and easily in various ways. Because effective methods to prevent or treat norovirus have not been discovered, it is important to rapidly recognize and report norovirus outbreaks in the early phase. Internet search has been a useful method for people to access information immediately. With the precise record of Internet search trends, Internet search has been a useful tool to manifest infectious disease outbreaks. OBJECTIVE In this study, we tried to discover the correlation between Internet search terms and norovirus infection. METHODS The Internet search trend data of norovirus were obtained from Google Trends. We used cross-correlation analysis to discover the temporal correlation between norovirus and other terms. We also used multiple linear regression with the stepwise method to recognize the most important predictors of Internet search trends and norovirus. In addition, we evaluated the temporal correlation between actual norovirus cases and Internet search terms in New York, California, and USA. RESULTS Some Google search terms such as gastroenteritis, vomiting, and watery diarrhea were coincided with norovirus Google Trends. Some Google search terms such as contagious, Norwalk virus, travel presented earlier than norovirus Google Trends. Some Google search terms such as dehydration, bar, and restaurant presented several months later than norovirus Google Trends. We found that the symptoms of gastroenteritis, including vomiting and watery diarrhea, were important factors that were significantly correlated with norovirus Google Trends. In actual norovirus cases of New York, California, and USA, some Google search terms presented coincided, earlier, or later than actual norovirus cases. CONCLUSIONS Our study provides novel strategy-based Internet search evidence regarding the epidemiology of norovirus.

Download Full-text

The correlation between Google trends and salmonellosis

BMC Public Health ◽

10.1186/s12889-021-11615-w ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Ming-Yang Wang ◽

Nai-jun Tang

Keyword(s):

Infectious Disease ◽

Cross Correlation ◽

Hypovolemic Shock ◽

Google Trends ◽

Search Terms ◽

Cross Correlation Analysis ◽

Common Infectious Disease ◽

The Usa ◽

Google Search ◽

The Relationship

Abstract Background Salmonella infection (salmonellosis) is a common infectious disease leading to gastroenteritis, dehydration, uveitis, etc. Internet search is a new method to monitor the outbreak of infectious disease. An internet-based surveillance system using internet data is logistically advantageous and economical to show term-related diseases. In this study, we tried to determine the relationship between salmonellosis and Google Trends in the USA from January 2004 to December 2017. Methods We downloaded the reported salmonellosis in the USA from the National Outbreak Reporting System (NORS) from January 2004 to December 2017. Additionally, we downloaded the Google search terms related to salmonellosis from Google Trends in the same period. Cross-correlation analysis and multiple regression analysis were conducted. Results The results showed that 6 Google Trends search terms appeared earlier than reported salmonellosis, 26 Google Trends search terms coincided with salmonellosis, and 16 Google Trends search terms appeared after salmonellosis were reported. When the search terms preceded outbreaks, “foods” (t = 2.927, P = 0.004) was a predictor of salmonellosis. When the search terms coincided with outbreaks, “hotel” (t = 1.854, P = 0.066), “poor sanitation” (t = 2.895, P = 0.004), “blueberries” (t = 2.441, P = 0.016), and “hypovolemic shock” (t = 2.001, P = 0.047) were predictors of salmonellosis. When the search terms appeared after outbreaks, “ice cream” (t = 3.077, P = 0.002) was the predictor of salmonellosis. Finally, we identified the most important indicators of Google Trends search terms, including “hotel” (t = 1.854, P = 0.066), “poor sanitation” (t = 2.895, P = 0.004), “blueberries” (t = 2.441, P = 0.016), and “hypovolemic shock” (t = 2.001, P = 0.047). In the future, the increased search activities of these terms might indicate the salmonellosis. Conclusion We evaluated the related Google Trends search terms with salmonellosis and identified the most important predictors of salmonellosis outbreak.

Download Full-text

Infodemiology of systemic lupus erythematous using Google Trends

Lupus ◽

10.1177/0961203317691372 ◽

2017 ◽

Vol 26 (8) ◽

pp. 886-889 ◽

Cited By ~ 18

Author(s):

M Radin ◽

S Sciascia

Keyword(s):

Big Data ◽

Northern Hemisphere ◽

Lupus Erythematosus ◽

Kendall Test ◽

Google Trends ◽

Data Monitoring ◽

Search Term ◽

Systemic Lupus ◽

Search Terms ◽

Google Search

Objective People affected by chronic rheumatic conditions, such as systemic lupus erythematosus (SLE), frequently rely on the Internet and search engines to look for terms related to their disease and its possible causes, symptoms and treatments. ‘Infodemiology’ and ‘infoveillance’ are two recent terms created to describe a new developing approach for public health, based on Big Data monitoring and data mining. In this study, we aim to investigate trends of Internet research linked to SLE and symptoms associated with the disease, applying a Big Data monitoring approach. Methods We analysed the large amount of data generated by Google Trends, considering ‘lupus’, ‘relapse’ and ‘fatigue’ in a 10-year web-based research. Google Trends automatically normalized data for the overall number of searches, and presented them as relative search volumes, in order to compare variations of different search terms across regions and periods. The Menn–Kendall test was used to evaluate the overall seasonal trend of each search term and possible correlation between search terms. Results We observed a seasonality for Google search volumes for lupus-related terms. In the Northern hemisphere, relative search volumes for ‘lupus’ were correlated with ‘relapse’ (τ = 0.85; p = 0.019) and with fatigue (τ = 0.82; p = 0.003), whereas in the Southern hemisphere we observed a significant correlation between ‘fatigue’ and ‘relapse’ (τ = 0.85; p = 0.018). Similarly, a significant correlation between ‘fatigue’ and ‘relapse’ (τ = 0.70; p < 0.001) was seen also in the Northern hemisphere. Conclusion Despite the intrinsic limitations of this approach, Internet-acquired data might represent a real-time surveillance tool and an alert for healthcare systems in order to plan the most appropriate resources in specific moments with higher disease burden.

Download Full-text

Nowcasting Sexually Transmitted Infections in Chicago: Predictive Modeling and Evaluation Study Using Google Trends

JMIR Public Health and Surveillance ◽

10.2196/20588 ◽

2020 ◽

Vol 6 (4) ◽

pp. e20588

Author(s):

Amy Kristen Johnson ◽

Runa Bhaumik ◽

Irina Tabidze ◽

Supriya D Mehta

Keyword(s):

Public Health ◽

Correlation Analysis ◽

Prediction Accuracy ◽

Cross Correlation ◽

Secondary Syphilis ◽

Google Trends ◽

Sexually Transmitted ◽

Search Terms ◽

Cross Correlation Analysis ◽

Case Data

Background Sexually transmitted infections (STIs) pose a significant public health challenge in the United States. Traditional surveillance systems are adversely affected by data quality issues, underreporting of cases, and reporting delays, resulting in missed prevention opportunities to respond to trends in disease prevalence. Search engine data can potentially facilitate an efficient and economical enhancement to surveillance reporting systems established for STIs. Objective We aimed to develop and train a predictive model using reported STI case data from Chicago, Illinois, and to investigate the model’s predictive capacity, timeliness, and ability to target interventions to subpopulations using Google Trends data. Methods Deidentified STI case data for chlamydia, gonorrhea, and primary and secondary syphilis from 2011-2017 were obtained from the Chicago Department of Public Health. The data set included race/ethnicity, age, and birth sex. Google Correlate was used to identify the top 100 correlated search terms with “STD symptoms,” and an autocrawler was established using Google Health Application Programming Interface to collect the search volume for each term. Elastic net regression was used to evaluate prediction accuracy, and cross-correlation analysis was used to identify timeliness of prediction. Subgroup elastic net regression analysis was performed for race, sex, and age. Results For gonorrhea and chlamydia, actual and predicted STI values correlated moderately in 2011 (chlamydia: r=0.65; gonorrhea: r=0.72) but correlated highly (chlamydia: r=0.90; gonorrhea: r=0.94) from 2012 to 2017. However, for primary and secondary syphilis, the high correlation was observed only for 2012 (r=0.79), 2013 (r=0.77), 2016 (0.80), and 2017 (r=0.84), with 2011, 2014, and 2015 showing moderate correlations (r=0.55-0.70). Model performance was the most accurate (highest correlation and lowest mean absolute error) for gonorrhea. Subgroup analyses improved model fit across disease and year. Regression models using search terms selected from the cross-correlation analysis improved the prediction accuracy and timeliness across diseases and years. Conclusions Integrating nowcasting with Google Trends in surveillance activities can potentially enhance the prediction and timeliness of outbreak detection and response as well as target interventions to subpopulations. Future studies should prospectively examine the utility of Google Trends applied to STI surveillance and response.

Download Full-text

Estimating the Outpatient Burden of Venous Thromboembolism in the United States: An Analysis of Google Trends Data from 2004 to 2015

Blood ◽

10.1182/blood.v126.23.4453.4453 ◽

2015 ◽

Vol 126 (23) ◽

pp. 4453-4453 ◽

Cited By ~ 1

Author(s):

Adeel M Khan ◽

Alok A. Khorana

Keyword(s):

United States ◽

Venous Thromboembolism ◽

Varicose Veins ◽

Blood Clot ◽

Seasonal Pattern ◽

The United States ◽

Google Trends ◽

Internet Search ◽

Advisory Committees ◽

Search Terms

Abstract Background: Analysis of internet search traffic has provided a new proxy measure into the trends and patterns of patients' diseases and their information-seeking behaviors. In recent years, Google Trends has become a data resource of interest given its status as the largest internet search provider in the world with publicly-viewable, passively-collected, and expansive data on any searchable term or combination of terms. For instance, search terms related to influenza (e.g. fever) predicted influenza spread faster than standard surveillance as shown by Ginsberg et al in Nature 2009. The true outpatient burden and incidence of venous thromboembolism (VTE) has long been debated. Extant VTE data are limited to cases that present to medical attention, thus missing any cases that do not come to an emergency department or clinic. We hypothesized that Google Trends analysis offers potential insight into the general populations' blood clot burden and awareness. This study aimed to explore general trends of VTE-related terms and seasonal variation in searches. Methods: Google Trends was utilized to obtain relative search engine traffic values (defined as search volume indices, SVIs) for terms related to DVT in the United States from summer 2004 - winter 2015. Terms related to LEG SWELLING, CALF PAIN, VARICOSE VEINS, and LEG CLOT were used and combined with Boolean operators to combine similar terms. A separate search occurred for BLOOD CLOT and related terms to investigate awareness of VTEs. Analysis was performed in R (V3.1.1) in accordance with previously published Google Trends investigations. Results: The average relative volume of searches was highest for VARICOSE VEINS and lowest for LEG SWELLING by approximately 3.2 fold. A seasonal pattern was seen with summer months (May-Aug) having higher SVIs than winter months (Nov-Feb) for all terms in the 11 year study period except for BLOOD CLOT. Using a Wilcoxon signed rank test, mean SVI difference comparing summer to winter for LEG SWELLING showed W = 66 (p = 0.004), for CALF PAIN W = 66 (p = 0.003), for VARICOSE VEINS W = 67 (p = 0.004), and for LEG CLOT W = 65 (p = 0.005). For BLOOD CLOT, a gradual increase in SVIs was seen and characterized by a Mann-Kendall test as having a significant positive trend, S = 898, p = 0.024. Conclusions: Search terms related to VTE in the United States show a strong seasonal pattern with greater search activity in summer months compared to winters months. These data suggest a higher incidence and burden of VTE in the summer. This challenges previous notions of a weakly higher incidence of VTE in winter months, calculated as a relative risk of 1.143 by Dentali et al in 2011. The gradual increase in relative search traffic for BLOOD CLOT terms reflects a likely rising awareness and/or true rise in the incidence of VTEs in the United States from 2004-2015. Further studies should investigate whether internet search traffic correlates directly with total yearly DVT incidence rates. Keywords: Population, venous thromboembolism, incidence Figure 1. Figure 1. Figure 2. Figure 2. Disclosures Khorana: Leo Pharma: Consultancy, Honoraria, Membership on an entity's Board of Directors or advisory committees; Janssen: Consultancy, Honoraria, Membership on an entity's Board of Directors or advisory committees; Daiichi Sankyo: Consultancy, Honoraria; sanofi: Consultancy, Honoraria; Pfizer: Consultancy, Honoraria; Boehringer-Ingelheim: Consultancy, Honoraria.

Download Full-text

How popular is Islamic finance in the USA? Findings from Google Trends

International Journal of Finance & Banking Studies (2147-4486) ◽

10.20525/ijfbs.v8i3.490 ◽

2019 ◽

Vol 8 (3) ◽

pp. 58-65

Author(s):

Wesal M. Aldarabseh

Keyword(s):

New York ◽

New Jersey ◽

Longitudinal Survey ◽

Global Distribution ◽

Google Trends ◽

Islamic Finance ◽

Us States ◽

Islamic Bank ◽

Search Terms ◽

The Usa

Islamic finance is a growing industry with global distribution in all continents including Europe and America. The aim of the current study was to examine how popular is Islamic finance in the USA during the period 2014-2019 using Google Trends. In addition, the interest in Islamic finance across different US states was also investigated. Using “Islamic finance” and “Islamic bank” as search terms in Goggle Trends, the trend curve showed decreases in search volumes, suggesting a decline in the popularity of Islamic finance in the USA with years. Search volumes were detected in seven out of 50 states, suggesting low interest in Islamic finance in the majority of US states. The order of the popularity in the seven states was: Virginia > New York > New Jersey > Illinois > Texas > California > Pennsylvania > Georgia > Florida > Massachusetts. Longitudinal survey studies are needed to confirm the present findings.

Download Full-text

Modeling Zika Virus Spread in Colombia Using Google Search Queries and Logistic Power Models

10.1101/365155 ◽

2018 ◽

Author(s):

Mekenna Brown ◽

Christopher Cain ◽

James Whitfield ◽

Edwin Ding ◽

Sara Y Del Valle ◽

...

Keyword(s):

Disease Outbreaks ◽

Google Trends ◽

Logistic Growth ◽

Data Sets ◽

Surveillance Systems ◽

Public Health Agencies ◽

Power Models ◽

Clinical Surveillance ◽

Search Data ◽

Google Search

AbstractPublic health agencies generally have a small window to respond to burgeoning disease outbreaks in order to mitigate the potential impact. There has been significant interest in developing forecasting models that can predict how and where a disease will spread. However, since clinical surveillance systems typically publish data with a lag of two or more weeks, there is a need for complimentary data streams that can close this gap. We examined the usefulness of Google Trends search data for analyzing the 2016 Zika epidemic in Colombia and evaluating their ability to predict its spread. We calculated the correlation and the time delay between the reported case data and the Google Trends data using variations of the logistic growth model, and showed that the data sets were systematically offset from each other, implying a lead time in the Google Trends data. Our study showed how Internet data can potentially complement clinical surveillance data and may be used as an effective early detection tool for disease outbreaks.

Download Full-text

Gauging public awareness and interest related to pancreatic cancer: An analysis of Google search volumes.

Journal of Clinical Oncology ◽

10.1200/jco.2018.36.4_suppl.250 ◽

2018 ◽

Vol 36 (4_suppl) ◽

pp. 250-250

Author(s):

Dhruvika Mukhija ◽

Alok A. Khorana ◽

Davendra Sohal

Keyword(s):

Pancreatic Cancer ◽

Colon Cancer ◽

Public Awareness ◽

P Value ◽

Internet Search ◽

Awareness Campaigns ◽

Search Terms ◽

Search Volume ◽

News Events ◽

Google Search

250 Background: Over the last 2 decades, the internet has become a major source of medical information. Infoveillance, i.e., public health surveillance using online content analysis has become a powerful tool and internet search activity has been used as a surrogate to gauge public awareness and interest for particular diseases. We aimed to evaluate the search volume for pancreatic cancer (PC), using colon cancer (CC), as a comparator, using data from a popular search engine. Methods: Using Google Trends, a public web facility of Google Inc., based on Google Search, we compared the relative frequency of search terms ‘pancreatic cancer’ and ‘colon cancer’ between 1st January 2004 and 31st August 2017 (n = 164 months). The program assigns a reference value of 100 for the point of maximum popularity from among all the search terms during the search period and provides comparative monthly scores, which we termed relative interest scores (RIS). The RIS for each cancer was then adjusted for incidence (i.e., 53,070 for PC and 95,270 for CC, based on 2016 data), calculated per 10,000 patients and termed ‘i-RIS’. A p-value of < 0.05 was considered significant. Results: For the entire duration, the maximum popularity (RIS = 100) corresponded to a point in March 2008 for PC, likely related to the diagnosis of a famous celebrity during that month. Similar but smaller surges in RIS were observed for other significant news events related to PC during other months (January 2009, October 2009 and October 2011). Overall, the mean (±S.D) RIS for PC and CC were 32.52±8.98 and 50.18±6.44, respectively (p < 0.001). However, the i-RIS was somewhat higher for PC (6.12±1.69) as compared with CC (5.26±0.67) (p < 0.001). Conclusions: Internet search data can provide estimates of public awareness and interest related to cancer. For PC, incidence-adjusted search volumes show spikes in search volumes related to major news events, providing internal validation of these results. Generating news items and promotion by celebrities may play a significant role in the success of cancer awareness campaigns.

Download Full-text

Nowcasting Sexually Transmitted Infections in Chicago: Predictive Modeling and Evaluation Study Using Google Trends (Preprint)

10.2196/preprints.20588 ◽

2020 ◽

Author(s):

Amy Kristen Johnson ◽

Runa Bhaumik ◽

Irina Tabidze ◽

Supriya D Mehta

Keyword(s):

Public Health ◽

Correlation Analysis ◽

Prediction Accuracy ◽

Cross Correlation ◽

Secondary Syphilis ◽

Google Trends ◽

Sexually Transmitted ◽

Search Terms ◽

Cross Correlation Analysis ◽

Case Data

BACKGROUND Sexually transmitted infections (STIs) pose a significant public health challenge in the United States. Traditional surveillance systems are adversely affected by data quality issues, underreporting of cases, and reporting delays, resulting in missed prevention opportunities to respond to trends in disease prevalence. Search engine data can potentially facilitate an efficient and economical enhancement to surveillance reporting systems established for STIs. OBJECTIVE We aimed to develop and train a predictive model using reported STI case data from Chicago, Illinois, and to investigate the model’s predictive capacity, timeliness, and ability to target interventions to subpopulations using Google Trends data. METHODS Deidentified STI case data for chlamydia, gonorrhea, and primary and secondary syphilis from 2011-2017 were obtained from the Chicago Department of Public Health. The data set included race/ethnicity, age, and birth sex. Google Correlate was used to identify the top 100 correlated search terms with “STD symptoms,” and an autocrawler was established using Google Health Application Programming Interface to collect the search volume for each term. Elastic net regression was used to evaluate prediction accuracy, and cross-correlation analysis was used to identify timeliness of prediction. Subgroup elastic net regression analysis was performed for race, sex, and age. RESULTS For gonorrhea and chlamydia, actual and predicted STI values correlated moderately in 2011 (chlamydia: r=0.65; gonorrhea: r=0.72) but correlated highly (chlamydia: r=0.90; gonorrhea: r=0.94) from 2012 to 2017. However, for primary and secondary syphilis, the high correlation was observed only for 2012 (r=0.79), 2013 (r=0.77), 2016 (0.80), and 2017 (r=0.84), with 2011, 2014, and 2015 showing moderate correlations (r=0.55-0.70). Model performance was the most accurate (highest correlation and lowest mean absolute error) for gonorrhea. Subgroup analyses improved model fit across disease and year. Regression models using search terms selected from the cross-correlation analysis improved the prediction accuracy and timeliness across diseases and years. CONCLUSIONS Integrating nowcasting with Google Trends in surveillance activities can potentially enhance the prediction and timeliness of outbreak detection and response as well as target interventions to subpopulations. Future studies should prospectively examine the utility of Google Trends applied to STI surveillance and response.

Download Full-text

The rise and fall of rationality in language

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2107848118 ◽

2021 ◽

Vol 118 (51) ◽

pp. e2107848118

Author(s):

Marten Scheffer ◽

Ingrid van de Leemput ◽

Els Weinans ◽

Johan Bollen

Keyword(s):

New York ◽

Public Interest ◽

New York Times ◽

Human Experience ◽

Historical Period ◽

The Past ◽

Search Terms ◽

Political Argumentation ◽

The Individual ◽

Google Search

The surge of post-truth political argumentation suggests that we are living in a special historical period when it comes to the balance between emotion and reasoning. To explore if this is indeed the case, we analyze language in millions of books covering the period from 1850 to 2019 represented in Google nGram data. We show that the use of words associated with rationality, such as “determine” and “conclusion,” rose systematically after 1850, while words related to human experience such as “feel” and “believe” declined. This pattern reversed over the past decades, paralleled by a shift from a collectivistic to an individualistic focus as reflected, among other things, by the ratio of singular to plural pronouns such as “I”/”we” and “he”/”they.” Interpreting this synchronous sea change in book language remains challenging. However, as we show, the nature of this reversal occurs in fiction as well as nonfiction. Moreover, the pattern of change in the ratio between sentiment and rationality flag words since 1850 also occurs in New York Times articles, suggesting that it is not an artifact of the book corpora we analyzed. Finally, we show that word trends in books parallel trends in corresponding Google search terms, supporting the idea that changes in book language do in part reflect changes in interest. All in all, our results suggest that over the past decades, there has been a marked shift in public interest from the collective to the individual, and from rationality toward emotion.

Download Full-text

Trends of popularity of cardiac biomarkers: Insights from Google Trends

Emergency Care Journal ◽

10.4081/ecj.2018.7769 ◽

2018 ◽

Vol 14 (3) ◽

Author(s):

Giuseppe Lippi ◽

Camilla Mattiuzzi ◽

Gianfranco Cervellin

Keyword(s):

Graphical Analysis ◽

Google Trends ◽

Cardiac Biomarkers ◽

Pronounced Increase ◽

Cardiac Biomarker ◽

The Past ◽

Search Terms ◽

Creatine Kinase Mb ◽

The One ◽

Google Search

This study was aimed at assessing the trend of worldwide popularity, thus likely reflecting usage, of conventional cardiac biomarkers, including cardiac troponins, myoglobin and creatine kinase MB (CK-MB). Google Trends was interrogated using a combination of the three search terms “troponin” AND “myoglobin” AND “CK-MB”, with a time limit set between January 1, 2014 (i.e., the oldest searchable year) and present time (i.e., August 13, 2018). The raw data were entered into an Excel worksheet and reported as cumulative Google Trends scores per week for each cardiac biomarker. The popularity score of myoglobin and CK-MB has displayed a significantly decreasing trend since the 2004, whilst that of troponin has exhibited an apparently paradoxical Ushape behavior, with a more pronounced increase during the past 10 years. The correlation between time and cumulative Google searches was significant for all biomarkers, being r= 0.40 (P<0.001) for troponin, r= -0.45 (P<0.001) for myoglobin and r= - 0.79 (P<0.001) for CK-MB. The score of overall Google searches for troponin was approximately 2.5-fold and 8.5-fold higher than for myoglobin and CK-MB, respectively. When the analysis was limited to the past ten years, the correlation between time and cumulative Google searches became even stronger for troponin (r= 0.85; P<0.001), remained virtually identical for CK-MB (r= -0.80; P<0.001), whilst it was no longer significant for myoglobin (r= - 0.13; P=0.150). The graphical analysis of Google search frequency also showed that CK-MB appears to be popular in Mexico, Brazil, Chile, Japan, Poland and Romania, myoglobin seems popular in Uruguay, Bolivia, Denmark, Kazakhstan and in some African Nations, whilst troponin is mostly predominant in the remaining parts of the world. The results of this study suggest that, despite all available guidelines share the principle that cardiac troponin should be considered the one and only reference biomarker for diagnosing myocardial ischemia, CK-MB and especially myoglobin are still popular worldwide, especially in certain geographic areas.

Download Full-text