Examining the Public’s Most Frequently Asked Questions Regarding COVID-19 Vaccines Using Search Engine Analytics in the United States: Observational Study

Nicholas B Sajjadi; Samuel Shepard; Ryan Ottwell; Kelly Murray; Justin Chronister; Micah Hartwell; Matt Vassar

doi:10.2196/28740

Examining the Public’s Most Frequently Asked Questions Regarding COVID-19 Vaccines Using Search Engine Analytics in the United States: Observational Study

JMIR Infodemiology ◽

10.2196/28740 ◽

2021 ◽

Vol 1 (1) ◽

pp. e28740

Author(s):

Nicholas B Sajjadi ◽

Samuel Shepard ◽

Ryan Ottwell ◽

Kelly Murray ◽

Justin Chronister ◽

...

Keyword(s):

Machine Learning ◽

Analysis Of Variance ◽

The United States ◽

Machine Learning Algorithms ◽

Medical Decision ◽

Common Source ◽

Safety And Efficacy ◽

Information Transparency ◽

Source Type ◽

Significant Difference

Background The emergency authorization of COVID-19 vaccines has offered the first means of long-term protection against COVID-19–related illness since the pandemic began. It is important for health care professionals to understand commonly held COVID-19 vaccine concerns and to be equipped with quality information that can be used to assist in medical decision-making. Objective Using Google’s RankBrain machine learning algorithm, we sought to characterize the content of the most frequently asked questions (FAQs) about COVID-19 vaccines evidenced by internet searches. Secondarily, we sought to examine the information transparency and quality of sources used by Google to answer FAQs on COVID-19 vaccines. Methods We searched COVID-19 vaccine terms on Google and used the “People also ask” box to obtain FAQs generated by Google’s machine learning algorithms. FAQs are assigned an “answer” source by Google. We extracted FAQs and answer sources related to COVID-19 vaccines. We used the Rothwell Classification of Questions to categorize questions on the basis of content. We classified answer sources as either academic, commercial, government, media outlet, or medical practice. We used the Journal of the American Medical Association’s (JAMA’s) benchmark criteria to assess information transparency and Brief DISCERN to assess information quality for answer sources. FAQ and answer source type frequencies were calculated. Chi-square tests were used to determine associations between information transparency by source type. One-way analysis of variance was used to assess differences in mean Brief DISCERN scores by source type. Results Our search yielded 28 unique FAQs about COVID-19 vaccines. Most COVID-19 vaccine–related FAQs were seeking factual information (22/28, 78.6%), specifically about safety and efficacy (9/22, 40.9%). The most common source type was media outlets (12/28, 42.9%), followed by government sources (11/28, 39.3%). Nineteen sources met 3 or more JAMA benchmark criteria with government sources as the majority (10/19, 52.6%). JAMA benchmark criteria performance did not significantly differ among source types (χ24=7.40; P=.12). One-way analysis of variance revealed a significant difference in mean Brief DISCERN scores by source type (F4,23=10.27; P<.001). Conclusions The most frequently asked COVID-19 vaccine–related questions pertained to vaccine safety and efficacy. We found that government sources provided the most transparent and highest-quality web-based COVID-19 vaccine–related information. Recognizing common questions and concerns about COVID-19 vaccines may assist in improving vaccination efforts.

Download Full-text

Examining the Public’s Most Frequently Asked Questions Regarding COVID-19 Vaccines Using Search Engine Analytics in the United States: Observational Study (Preprint)

10.2196/preprints.28740 ◽

2021 ◽

Author(s):

Nicholas B Sajjadi ◽

Samuel Shepard ◽

Ryan Ottwell ◽

Kelly Murray ◽

Justin Chronister ◽

...

Keyword(s):

Machine Learning ◽

Analysis Of Variance ◽

The United States ◽

Machine Learning Algorithms ◽

Medical Decision ◽

Common Source ◽

Safety And Efficacy ◽

Information Transparency ◽

Source Type ◽

Significant Difference

BACKGROUND The emergency authorization of COVID-19 vaccines has offered the first means of long-term protection against COVID-19–related illness since the pandemic began. It is important for health care professionals to understand commonly held COVID-19 vaccine concerns and to be equipped with quality information that can be used to assist in medical decision-making. OBJECTIVE Using Google’s RankBrain machine learning algorithm, we sought to characterize the content of the most frequently asked questions (FAQs) about COVID-19 vaccines evidenced by internet searches. Secondarily, we sought to examine the information transparency and quality of sources used by Google to answer FAQs on COVID-19 vaccines. METHODS We searched COVID-19 vaccine terms on Google and used the “People also ask” box to obtain FAQs generated by Google’s machine learning algorithms. FAQs are assigned an “answer” source by Google. We extracted FAQs and answer sources related to COVID-19 vaccines. We used the Rothwell Classification of Questions to categorize questions on the basis of content. We classified answer sources as either academic, commercial, government, media outlet, or medical practice. We used the Journal of the American Medical Association’s (JAMA’s) benchmark criteria to assess information transparency and Brief DISCERN to assess information quality for answer sources. FAQ and answer source type frequencies were calculated. Chi-square tests were used to determine associations between information transparency by source type. One-way analysis of variance was used to assess differences in mean Brief DISCERN scores by source type. RESULTS Our search yielded 28 unique FAQs about COVID-19 vaccines. Most COVID-19 vaccine–related FAQs were seeking factual information (22/28, 78.6%), specifically about safety and efficacy (9/22, 40.9%). The most common source type was media outlets (12/28, 42.9%), followed by government sources (11/28, 39.3%). Nineteen sources met 3 or more JAMA benchmark criteria with government sources as the majority (10/19, 52.6%). JAMA benchmark criteria performance did not significantly differ among source types (χ24=7.40; P=.12). One-way analysis of variance revealed a significant difference in mean Brief DISCERN scores by source type (F4,23=10.27; P<.001). CONCLUSIONS The most frequently asked COVID-19 vaccine–related questions pertained to vaccine safety and efficacy. We found that government sources provided the most transparent and highest-quality web-based COVID-19 vaccine–related information. Recognizing common questions and concerns about COVID-19 vaccines may assist in improving vaccination efforts.

Download Full-text

Analyzing the occurrence of environmental indicator minerals using clustering techniques and mineral networks

10.5194/egusphere-egu21-14074 ◽

2021 ◽

Author(s):

Jason Williams ◽

Sally Potter-McIntyre ◽

Justin Filiberto ◽

Shaunna Morrison ◽

Daniel Hummer

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

The United States ◽

Hydrothermal Systems ◽

Machine Learning Algorithms ◽

Environmental Indicator ◽

Clustering Techniques ◽

Geological Processes ◽

Indicator Mineral ◽

Indicator Minerals

Indicator minerals have special physical and chemical properties that can be analyzed to glean information concerning the composition of host rocks and formational (or altering) fluids. Clay, zeolite, and tourmaline mineral groups are all ubiquitous at the Earth&#8217;s surface and shallow crust and distributed through a wide variety of sedimentary, igneous, metamorphic, and hydrothermal systems. Traditional studies of indicator mineral-bearing deposits have provided a wealth of data that could be integral to discovering new insights into the formation and evolution of naturally occurring systems. This study evaluates the relationships that exist between different environmental indicator mineral groups through the implementation of machine learning algorithms and network diagrams. Mineral occurrence data for thousands of localities hosting clay, zeolite, and tourmaline minerals were retrieved from mineral databases. Clustering techniques (e.g., agglomerative hierarchical clustering and density based spatial clustering of applications with noise) combined with network analyses were used to analyze the compiled dataset in an effort to characterize and identify geological processes operating at different localities across the United States. Ultimately, this study evaluates the ability of machine learning algorithms to act as supplementary diagnostic and interpretive tools in geoscientific studies.

Download Full-text

Machine Learning Prediction of Parkinson's Disease Onset and Subtype Using Germline Variants

10.1101/2021.06.14.21258631 ◽

2021 ◽

Author(s):

Saya R Dennis ◽

Tanya Simuni ◽

Yuan Luo

Keyword(s):

Machine Learning ◽

Parkinson’S Disease ◽

Parkinson's Disease ◽

Neurodegenerative Disorder ◽

Disease Onset ◽

The United States ◽

Machine Learning Algorithms ◽

Progression Rate ◽

High Importance ◽

Germline Variants

Parkinson's Disease is the second most common neurodegenerative disorder in the United States, and is characterized by a largely irreversible worsening of motor and non-motor symptoms as the disease progresses. A prominent characteristic of the disease is its high heterogeneity in manifestation as well as the progression rate. For sporadic Parkinson's Disease, which comprises ~90% of all diagnoses, the relationship between the patient genome and disease onset or progression subtype remains largely elusive. Machine learning algorithms are increasingly adopted to study the genomics of diseases due to their ability to capture patterns within the vast feature space of the human genome that might be contributing to the phenotype of interest. In our study, we develop two machine learning models that predict the onset as well as the progression subtype of Parkinson's Disease based on subjects' germline mutations. Our best models achieved an ROC of 0.77 and 0.61 for disease onset and subtype prediction, respectively. To the best of our knowledge, our models present state-of-the-art prediction performances of PD onset and subtype solely based on the subjects' germline variants. The genes with high importance in our best-performing models were enriched for several canonical pathways related to signaling, immune system, and protein modifications, all of which have been previously associated with PD symptoms or pathogenesis. These high-importance gene sets provide us with promising candidate genes for future biomedical and clinical research.

Download Full-text

Characteristics of Twitter Use by State Medicaid Programs in the United States: Machine Learning Approach

Journal of Medical Internet Research ◽

10.2196/18401 ◽

2020 ◽

Vol 22 (8) ◽

pp. e18401

Author(s):

Jane M Zhu ◽

Abeed Sarker ◽

Sarah Gollust ◽

Raina Merchant ◽

David Grande

Keyword(s):

Public Health ◽

United States ◽

Machine Learning ◽

Public Health Education ◽

The United States ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Care Organization ◽

The Public ◽

The Mean

Background Twitter is a potentially valuable tool for public health officials and state Medicaid programs in the United States, which provide public health insurance to 72 million Americans. Objective We aim to characterize how Medicaid agencies and managed care organization (MCO) health plans are using Twitter to communicate with the public. Methods Using Twitter’s public application programming interface, we collected 158,714 public posts (“tweets”) from active Twitter profiles of state Medicaid agencies and MCOs, spanning March 2014 through June 2019. Manual content analyses identified 5 broad categories of content, and these coded tweets were used to train supervised machine learning algorithms to classify all collected posts. Results We identified 15 state Medicaid agencies and 81 Medicaid MCOs on Twitter. The mean number of followers was 1784, the mean number of those followed was 542, and the mean number of posts was 2476. Approximately 39% of tweets came from just 10 accounts. Of all posts, 39.8% (63,168/158,714) were classified as general public health education and outreach; 23.5% (n=37,298) were about specific Medicaid policies, programs, services, or events; 18.4% (n=29,203) were organizational promotion of staff and activities; and 11.6% (n=18,411) contained general news and news links. Only 4.5% (n=7142) of posts were responses to specific questions, concerns, or complaints from the public. Conclusions Twitter has the potential to enhance community building, beneficiary engagement, and public health outreach, but appears to be underutilized by the Medicaid program.

Download Full-text

Machine learning improves the prediction of febrile neutropenia in Korean inpatients undergoing chemotherapy for breast cancer

Scientific Reports ◽

10.1038/s41598-020-71927-6 ◽

2020 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Bum-Joo Cho ◽

Kyoung Min Kim ◽

Sanchir-Erdene Bilegsaikhan ◽

Yong Joon Suh

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Risk Factors ◽

Febrile Neutropenia ◽

Prediction Models ◽

Learning Algorithms ◽

Area Under The Curve ◽

Primary Prophylaxis ◽

Machine Learning Algorithms ◽

Significant Difference

Abstract Febrile neutropenia (FN) is one of the most concerning complications of chemotherapy, and its prediction remains difficult. This study aimed to reveal the risk factors for and build the prediction models of FN using machine learning algorithms. Medical records of hospitalized patients who underwent chemotherapy after surgery for breast cancer between May 2002 and September 2018 were selectively reviewed for development of models. Demographic, clinical, pathological, and therapeutic data were analyzed to identify risk factors for FN. Using machine learning algorithms, prediction models were developed and evaluated for performance. Of 933 selected inpatients with a mean age of 51.8 ± 10.7 years, FN developed in 409 (43.8%) patients. There was a significant difference in FN incidence according to age, staging, taxane-based regimen, and blood count 5 days after chemotherapy. The area under the curve (AUC) built based on these findings was 0.870 on the basis of logistic regression. The AUC improved by machine learning was 0.908. Machine learning improves the prediction of FN in patients undergoing chemotherapy for breast cancer compared to the conventional statistical model. In these high-risk patients, primary prophylaxis with granulocyte colony-stimulating factor could be considered.

Download Full-text

Evaluation and calibration of a low-cost particle sensor in ambient conditions using machine-learning methods

Atmospheric Measurement Techniques ◽

10.5194/amt-13-1693-2020 ◽

2020 ◽

Vol 13 (4) ◽

pp. 1693-1707 ◽

Cited By ~ 8

Author(s):

Minxing Si ◽

Ying Xiong ◽

Shan Du ◽

Ke Du

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Random Search ◽

Low Cost ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Ambient Conditions ◽

Test Dataset ◽

Compact Size ◽

Significant Difference

Abstract. Particle sensing technology has shown great potential for monitoring particulate matter (PM) with very few temporal and spatial restrictions because of its low cost, compact size, and easy operation. However, the performance of low-cost sensors for PM monitoring in ambient conditions has not been thoroughly evaluated. Monitoring results by low-cost sensors are often questionable. In this study, a low-cost fine particle monitor (Plantower PMS 5003) was colocated with a reference instrument, the Synchronized Hybrid Ambient Real-time Particulate (SHARP) monitor, at the Calgary Varsity air monitoring station from December 2018 to April 2019. The study evaluated the performance of this low-cost PM sensor in ambient conditions and calibrated its readings using simple linear regression (SLR), multiple linear regression (MLR), and two more powerful machine-learning algorithms using random search techniques for the best model architectures. The two machine-learning algorithms are XGBoost and a feedforward neural network (NN). Field evaluation showed that the Pearson correlation (r) between the low-cost sensor and the SHARP instrument was 0.78. The Fligner and Killeen (F–K) test indicated a statistically significant difference between the variances of the PM2.5 values by the low-cost sensor and the SHARP instrument. Large overestimations by the low-cost sensor before calibration were observed in the field and were believed to be caused by the variation of ambient relative humidity. The root mean square error (RMSE) was 9.93 when comparing the low-cost sensor with the SHARP instrument. The calibration by the feedforward NN had the smallest RMSE of 3.91 in the test dataset compared to the calibrations by SLR (4.91), MLR (4.65), and XGBoost (4.19). After calibrations, the F–K test using the test dataset showed that the variances of the PM2.5 values by the NN, XGBoost, and the reference method were not statistically significantly different. From this study, we conclude that a feedforward NN is a promising method to address the poor performance of low-cost sensors for PM2.5 monitoring. In addition, the random search method for hyperparameters was demonstrated to be an efficient approach for selecting the best model structure.

Download Full-text

Prediction Models of Early Childhood Caries Based on Machine Learning Algorithms

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18168613 ◽

2021 ◽

Vol 18 (16) ◽

pp. 8613

Author(s):

You-Hyun Park ◽

Sung-Hwa Kim ◽

Yoon-Young Choi

Keyword(s):

Machine Learning ◽

Early Childhood ◽

Logistic Regression ◽

Random Forest ◽

Early Childhood Caries ◽

Prediction Models ◽

Risk Groups ◽

Machine Learning Algorithms ◽

Significant Difference ◽

Childhood Caries

In this study, we developed machine learning-based prediction models for early childhood caries and compared their performances with the traditional regression model. We analyzed the data of 4195 children aged 1–5 years from the Korea National Health and Nutrition Examination Survey data (2007–2018). Moreover, we developed prediction models using the XGBoost (version 1.3.1), random forest, and LightGBM (version 3.1.1) algorithms in addition to logistic regression. Two different methods were applied for variable selection, including a regression-based backward elimination and a random forest-based permutation importance classifier. We compared the area under the receiver operating characteristic (AUROC) values and misclassification rates of the different models and observed that all four prediction models had AUROC values ranging between 0.774 and 0.785. Furthermore, no significant difference was observed between the AUROC values of the four models. Based on the results, we can confirm that both traditional logistic regression and ML-based models can show favorable performance and can be used to predict early childhood caries, identify ECC high-risk groups, and implement active preventive treatments. However, further research is essential to improving the performance of the prediction model using recent methods, such as deep learning.

Download Full-text

ESTIMATING CORN YIELD IN THE UNITED STATES WITH MODIS EVI AND MACHINE LEARNING METHODS

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsannals-iii-8-131-2016 ◽

2016 ◽

Vol III-8 ◽

pp. 131-136 ◽

Cited By ~ 8

Author(s):

K. Kuwata ◽

R. Shibasaki

Keyword(s):

Neural Network ◽

United States ◽

Machine Learning ◽

Crop Yield ◽

The United States ◽

Machine Learning Algorithms ◽

Corn Yield ◽

County Level ◽

Entire Area ◽

Modis Evi

Satellite remote sensing is commonly used to monitor crop yield in wide areas. Because many parameters are necessary for crop yield estimation, modelling the relationships between parameters and crop yield is generally complicated. Several methodologies using machine learning have been proposed to solve this issue, but the accuracy of county-level estimation remains to be improved. In addition, estimating county-level crop yield across an entire country has not yet been achieved. In this study, we applied a deep neural network (DNN) to estimate corn yield. We evaluated the estimation accuracy of the DNN model by comparing it with other models trained by different machine learning algorithms. We also prepared two time-series datasets differing in duration and confirmed the feature extraction performance of models by inputting each dataset. As a result, the DNN estimated county-level corn yield for the entire area of the United States with a determination coefficient (R2) of 0.780 and a root mean square error (RMSE) of 18.2 bushels/acre. In addition, our results showed that estimation models that were trained by a neural network extracted features from the input data better than an existing machine learning algorithm.

Download Full-text

A Machine Learning Ensemble Based on Radiomics to Predict BI-RADS Category and Reduce the Biopsy Rate of Ultra-sound-Detected Suspicious Breast Masses

10.1101/2021.12.16.21267907 ◽

2021 ◽

Author(s):

Matteo Interlenghi ◽

Christian Salvatore ◽

Veronica Magni ◽

Gabriele Caldara ◽

Elia Schiavon ◽

...

Keyword(s):

Machine Learning ◽

Majority Vote ◽

Short Interval ◽

Machine Learning Algorithms ◽

Medical Decision ◽

Ultra Sound ◽

Support Vector ◽

Histopathological Diagnosis ◽

Breast Masses ◽

Biopsy Rate

We developed a machine learning model based on radiomics to predict the BI-RADS category of ultrasound-detected suspicious breast lesions and support medical decision making towards short-interval follow-up versus tissue sampling. From a retrospective 2015-2019 series of ultrasound-guided core needle biopsies performed by four board-certified breast radiologists using six ultrasound systems from three vendors, we collected 821 images of 834 suspicious breast masses from 819 patients, 404 malignant and 430 benign according to histopathology. A balanced image set of biopsy-proven benign (n = 299) and malignant (n = 299) lesions were used for training and cross-validation of ensembles of machine learning algorithms supervised during learning by histopathological diagnosis as a reference standard. Based on a majority vote (over 80% of the votes to have a valid prediction of benign lesion), an ensemble of support vector machines showed an ability to reduce the biopsy rate of benign lesions by 15% to 18%, always keeping a sensitivity over 94%, when externally tested on 236 images from two image sets: 1) 123 lesions (51 malignant and 72 benign) obtained from the same four ultrasound systems used for training, resulting into a positive predictive value (PPV) of 45.9% (95% confidence interval 36.3-55.7%) versus a radiologists' PPV of 41.5% (p < 0.005), combined with a 98.0% sensitivity (89.6-99.9%); 2) 113 lesions (54 malignant and 59 benign) obtained from two ultrasound systems from vendors different from those used for training, resulting into a 50.5% PPV (40.4-60.6%) versus a radiologists' PPV of 47.8% (p < 0.005), combined with a 94.4% sensitivity (84.6-98.8%). Errors in BI-RADS 3 category (i.e., assigned by the model as BI-RADS 4) were 0.8% and 2.7% in the Testing set I and II, respectively. The board-certified breast radiologist accepted the BI-RADS classes assigned by the model in 114 masses (92.7%) and modified the BI-RADS classes of 9 breast masses (7.3%). In 6 of 9 cases the model performed better than the radiologist, since it assigned a BI-RADS 3 classification to histopathology-confirmed benign masses that were classified as BI-RADS 4 by the radiologist.

Download Full-text

Predicting 180-day mortality for women with ovarian cancer using machine learning and patient-reported outcome data.

Journal of Clinical Oncology ◽

10.1200/jco.2021.39.15_suppl.e13555 ◽

2021 ◽

Vol 39 (15_suppl) ◽

pp. e13555-e13555

Author(s):

Chris Sidey-Gibbons ◽

Charlotte C. Sun ◽

Cai Xu ◽

Amy Schneider ◽

Sheng-Chieh Lu ◽

...

Keyword(s):

Machine Learning ◽

Ovarian Cancer ◽

End Of Life ◽

Learning Algorithms ◽

The United States ◽

Patient Reported Outcome ◽

Machine Learning Algorithms ◽

Electronic Health Record Data ◽

Testing Dataset ◽

Patient Reported

e13555 Background: Contra to national guidelines, women with ovarian cancer often receive aggressive treatment until the end-of-life. We trained machine learning algorithms to predict mortality within 180 days for women with ovarian cancer. Methods: Data were collected data from a single academic cancer institution in the United States. Women with recurrent ovarian cancer completed biopsychosocial patient-reported outcome measures (PROMs) every 90 days. We randomly partitioned our dataset into training and testing samples with a 2:1 ratio. We used synthetic minority oversampling to reduce class imbalance in the training dataset. We fitted training data to six machine learning algorithms and combined their classifications on the testing dataset into a voting ensemble. We assessed the accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC) for each algorithm. Results: We recruited 245 patients who completed 1319 assessments. The final voting ensemble performed well across all performance metrics (Accuracy = .79, Sensitivity = .71, Specificity = .80, AUROC = .76). The algorithm correctly identified 25 of the 35 women in the testing dataset who died within 180 days of assessment Conclusions: Machine learning algorithms trained using PROM data offer state-of-the-art performance in predicting whether a woman with ovarian cancer will reach the end-of-life within 180 days. We highlight the importance of PROM data in ML models of mortality. Our model exhibits substantial improvements in prediction sensitivity compared to other similar models trained using electronic health record data alone. This model could inform clinical decision making and improve the uptake of appropriate end-of-life care. Further research is warranted to expand on these findings in a larger, more diverse sample.

Download Full-text