Using Twitter to better understand online patient experience sentiments in the US (Preprint)
BACKGROUND Traditional large-scale assessments of patient care in the US have difficulty in representing all aspects of health, beyond hospital care. There are documented differences in access to healthcare across the US. It is important to understand disparities in healthcare to better inform policy makers and healthcare administrations to improve quality of care provided. Previous research indicates online data is available from Twitter about patient experiences and opinions of their healthcare. Understanding patient views through sentiment analyses of Twitter data can be used to supplement traditional feedback surveys. OBJECTIVE We aim to provide a characterization of patient experience sentiments across the US on Twitter over a four year period. METHODS We developed a set of software components to auto-label and examine the patient experience Twitter dataset. The set includes: (I) a classifier to determine patient experience tweets, (II) a geolocation inference engine for social data, (III) a modified version of a sentiment classifier from the literature, and (IV) another engine to determine if the tweet is from a metro or non-metro area. RESULTS Of the 27.3 million tweets collected between February 2013 and February 2017 using a set of patient experience related keywords, the classifier was able to identify 2,779,555 tweets that were labeled as patient experience. After running the patient experience tweets through the geolocation classifier, we identified 876,384 tweets by approximate location to use for spatial analyses. At the national level, we observed 27.7% of positive, 36.3% neutral, 36% of negative Patient Experience tweets. Overall, the average sentiment polarity shifted towards less negative every year across all the regions in the country. The patient experience tweet rate also decreased across all the states over the four year study period. We also observed the sentiment of tweets to have a lower negative fraction during daytime hours, whereas the sentiment of tweets posted between 8pm and 10am tend to have a higher negative fraction. Additionally, tweet sentiment varied by region and by metro vs. non-metro analyses. CONCLUSIONS This study presents methodologies for a deeper understanding of online discussion related to patient experience across space and time, and demonstrates how Twitter can provide a unique and unsolicited perspective from users, which may not be captured from traditional survey methods for understanding patient views.