Identifying health-related discussions of cannabis use on Twitter: a proof-of-concept study (Preprint)
BACKGROUND The cannabis product and regulatory landscape is changing in the United States. Against the backdrop of these changes, there have been increasing reports on health-related motives for cannabis use and of adverse events from its use. The use of social media data in monitoring cannabis-related health conversations may be useful to state and federal-level regulatory agencies as they grapple with identifying cannabis safety signals in a comprehensive and scalable fashion. OBJECTIVE This study attempted to determine the extent to which a medical dictionary, the Unified Medical Language System (UMLS) Consumer Health Vocabulary (CHV), could identify cannabis-related motivations of use and health consequences of its use as discussed on Twitter in 2020. METHODS Twitter posts containing cannabis-related terms were obtained from January 1 to August 31, 2020. Each post from the sample (n = 353,353) was classified into at least one of 17 a priori categories of commonly health-related topics, using a rule-based classifier with each category defined by the terms in the medical dictionary. A subsample of posts (n=1094) was then manually annotated to help validate the rule-based classifier and determine if each post pertained to health-related motivations for cannabis use or perceived adverse health effects from its use or neither. RESULTS The validation process suggested that the medical dictionary could identify health-related conversations in 31.2% of posts. Specifically, 20.4% of posts were accurately identified as relating to a health-related motivation for cannabis use, while 10.8% of posts were accurately identified as relating to a health-related consequence from cannabis use. Potential health-related conversations around cannabis use ranged from issues with the respiratory system and stress to the immune system and gastrointestinal problems, among other health topics. CONCLUSIONS The mining of social media data may prove helpful in improving surveillance of cannabis products and their adverse health effects. However, future research needs to develop and validate a dictionary and codebook that captures cannabis use-specific health conversations on Twitter.