scholarly journals Using Natural Language Processing to Examine the Uptake, Content, and Readability of Media Coverage of a Pan-Canadian Drug Safety Research Project: Cross-Sectional Observational Study (Preprint)

2019 ◽  
Author(s):  
Hossein Mohammadhassanzadeh ◽  
Ingrid Sketris ◽  
Robyn Traynor ◽  
Susan Alexander ◽  
Brandace Winquist ◽  
...  

BACKGROUND Isotretinoin, for treating cystic acne, increases the risk of miscarriage and fetal abnormalities when taken during pregnancy. The Health Canada–approved product monograph for isotretinoin includes pregnancy prevention guidelines. A recent study by the Canadian Network for Observational Drug Effect Studies (CNODES) on the occurrence of pregnancy and pregnancy outcomes during isotretinoin therapy estimated poor adherence to these guidelines. Media uptake of this study was unknown; awareness of this uptake could help improve drug safety communication. OBJECTIVE The aim of this study was to understand how the media present pharmacoepidemiological research using the CNODES isotretinoin study as a case study. METHODS Google News was searched (April 25-May 6, 2016), using a predefined set of terms, for mention of the CNODES study. In total, 26 articles and 3 CNODES publications (original article, press release, and podcast) were identified. The article texts were cleaned (eg, advertisements and links removed), and the podcast was transcribed. A dictionary of 1295 unique words was created using natural language processing (NLP) techniques (term frequency-inverse document frequency, Porter stemming, and stop-word filtering) to identify common words and phrases. Similarity between the articles and reference publications was calculated using Euclidian distance; articles were grouped using hierarchical agglomerative clustering. Nine readability scales were applied to measure text readability based on factors such as number of words, difficult words, syllables, sentence counts, and other textual metrics. RESULTS The top 5 dictionary words were <italic>pregnancy</italic> (250 appearances), <italic>isotretinoin</italic> (220), <italic>study</italic> (209), <italic>drug</italic> (201), and <italic>women</italic> (185). Three distinct clusters were identified: Clusters 2 (5 articles) and 3 (4 articles) were from health-related websites and media, respectively; Cluster 1 (18 articles) contained largely media sources; 2 articles fell outside these clusters. Use of the term <italic>isotretinoin</italic> versus <italic>Accutane</italic> (a brand name of isotretinoin), discussion of pregnancy complications, and assignment of responsibility for guideline adherence varied between clusters. For example, the term <italic>pregnanc</italic> appeared most often in Clusters 1 (14.6 average times per article) and 2 (11.4) and relatively infrequently in Cluster 3 (1.8). Average readability for all articles was high (eg, Flesch-Kincaid, 13; Gunning Fog, 15; SMOG Index, 10; Coleman Liau Index, 15; Linsear Write Index, 13; and Text Standard, 13). Readability increased from Cluster 2 (Gunning Fog of 16.9) to 3 (12.2). It varied between clusters (average 13th-15th grade) but exceeded the recommended health information reading level (grade 6th to 8th), overall. CONCLUSIONS Media interpretation of the CNODES study varied, with differences in synonym usage and areas of focus. All articles were written above the recommended health information reading level. Analyzing media using NLP techniques can help determine drug safety communication effectiveness. This project is important for understanding how drug safety studies are taken up and redistributed in the media.

10.2196/13296 ◽  
2020 ◽  
Vol 4 (1) ◽  
pp. e13296
Author(s):  
Hossein Mohammadhassanzadeh ◽  
Ingrid Sketris ◽  
Robyn Traynor ◽  
Susan Alexander ◽  
Brandace Winquist ◽  
...  

Background Isotretinoin, for treating cystic acne, increases the risk of miscarriage and fetal abnormalities when taken during pregnancy. The Health Canada–approved product monograph for isotretinoin includes pregnancy prevention guidelines. A recent study by the Canadian Network for Observational Drug Effect Studies (CNODES) on the occurrence of pregnancy and pregnancy outcomes during isotretinoin therapy estimated poor adherence to these guidelines. Media uptake of this study was unknown; awareness of this uptake could help improve drug safety communication. Objective The aim of this study was to understand how the media present pharmacoepidemiological research using the CNODES isotretinoin study as a case study. Methods Google News was searched (April 25-May 6, 2016), using a predefined set of terms, for mention of the CNODES study. In total, 26 articles and 3 CNODES publications (original article, press release, and podcast) were identified. The article texts were cleaned (eg, advertisements and links removed), and the podcast was transcribed. A dictionary of 1295 unique words was created using natural language processing (NLP) techniques (term frequency-inverse document frequency, Porter stemming, and stop-word filtering) to identify common words and phrases. Similarity between the articles and reference publications was calculated using Euclidian distance; articles were grouped using hierarchical agglomerative clustering. Nine readability scales were applied to measure text readability based on factors such as number of words, difficult words, syllables, sentence counts, and other textual metrics. Results The top 5 dictionary words were pregnancy (250 appearances), isotretinoin (220), study (209), drug (201), and women (185). Three distinct clusters were identified: Clusters 2 (5 articles) and 3 (4 articles) were from health-related websites and media, respectively; Cluster 1 (18 articles) contained largely media sources; 2 articles fell outside these clusters. Use of the term isotretinoin versus Accutane (a brand name of isotretinoin), discussion of pregnancy complications, and assignment of responsibility for guideline adherence varied between clusters. For example, the term pregnanc appeared most often in Clusters 1 (14.6 average times per article) and 2 (11.4) and relatively infrequently in Cluster 3 (1.8). Average readability for all articles was high (eg, Flesch-Kincaid, 13; Gunning Fog, 15; SMOG Index, 10; Coleman Liau Index, 15; Linsear Write Index, 13; and Text Standard, 13). Readability increased from Cluster 2 (Gunning Fog of 16.9) to 3 (12.2). It varied between clusters (average 13th-15th grade) but exceeded the recommended health information reading level (grade 6th to 8th), overall. Conclusions Media interpretation of the CNODES study varied, with differences in synonym usage and areas of focus. All articles were written above the recommended health information reading level. Analyzing media using NLP techniques can help determine drug safety communication effectiveness. This project is important for understanding how drug safety studies are taken up and redistributed in the media.


2016 ◽  
Vol 22 (1) ◽  
pp. 23-42 ◽  
Author(s):  
Michael J. Jensen

This paper develops a way for analyzing the structure of campaign communications within Twitter. The structure of communication affordances creates opportunities for a horizontal organization power within Twitter interactions. However, one cannot infer the structure of interactions as they materialize from the formal properties of the technical environment in which the communications occur. Consequently, the paper identifies three categories of empowering communication operations that can occur on Twitter: Campaigns can respond to others, campaigns can retweet others, and campaigns can call for others to become involved in the campaign on their own terms. The paper operationalizes these categories in the context of the 2015 U.K. general election. To determine whether Twitter is used to empower laypersons, the profiles of each account retweeted and replied to were retrieved and analyzed using natural language processing to identify whether an account is from a political figure, member of the media, or some other public figure. In addition, tweets and retweets are compared with respect to the manner key election issues are discussed. The findings indicate that empowering uses of Twitter are fairly marginal, and retweets use almost identical policy language as the original campaign tweets.


Author(s):  
Amol Agade ◽  
Samta Balpande

Ongoing COVID-19 Pandemic has resulted into massive damage to various platforms of global economy which has caused disruption to human livelihood. Natural Language Processing has been extensively used in different organizations to categorize sentiments, perform recommendation, summarizing information and topic modelling. This research aims to understand the non-medical impact of COVID-19 on global economy by leveraging the natural language processing methodology. This methodology comprises of text classification which includes topic modelling on unstructured COVID-19 media articles dataset provided by Anacode. Like other Natural Language Processing algorithms, Latent Dirichlet allocation (LDA) and Non-negative matrix factorization (NMF) has been proposed to classify the media articles dataset in order to analyze COVID-19 pandemic impacts in the different sectors of global economy. Model Accuracy was examined based on the coherence and perplexity score which came out to be 0.51 and -10.90 using LDA algorithm. Both the LDA and NMF algorithm identified similar prevalent topics that was impacted by COVID-19 pandemic in multiple sectors of economy. Through intertopic distance map visualization produced by LDA algorithm, it can be reciprocated that general industries which includes children schooling, parental care, and family gatherings had the major impact followed by business sector and the financial industry.


2020 ◽  
Vol 4 (s1) ◽  
pp. 115-115
Author(s):  
Matthieu Kirkland ◽  
Christian Reyes ◽  
Nancy Pire-Smerkanich ◽  
Eunjoo Pacifici

OBJECTIVES/GOALS: Clinical research is the backbone of the medical community. However, there are few regulations to ensure clinical trial participants can understand their results, leading to volunteers feeling unvalued and unlikely to enroll in trials1. This study examines the need of lay summaries METHODS/STUDY POPULATION: To understand the current landscape of clinical trial summaries, literature searches were conducted using the University of Southern California Library database with keywords Title contains “lay language” OR “lay summary” AND any field contains “Trial” OR “clinical”, and Title contains “natural language processing” AND “clinical trial” OR “Summary”. Studies were deemed relevant if they discussed lay language summaries for health care realms or using Natural Language Processing (NLP) to increase comprehension. Papers published by the Center for Information and Study on Clinical Research Participation (CISCRP) were reviewed and their Associate Director was interviewed. RESULTS/ANTICIPATED RESULTS: Of 67 total results, 14 were determined to be relevant. Ten of the relevant results examined lay language summaries and their regulation and 4 were NLP studies. The European Medicines Agency set regulations mandating clinical trial summaries. However, researchers have difficulty validating to an appropriate reading level2. Difficulty and potential bias halted a U.S. mandate of lay summaries3. The nonprofit CISCRP has partnered with industry to develop unbiased clinical trial summaries resulting in all volunteers feeling appreciated and 91% understanding clinical trial results post summary1. Similarly, NLP software for annotating Electronic Health Records increased comprehension for 77% of patients4. DISCUSSION/SIGNIFICANCE OF IMPACT: In the U.S., a lack of regulations mandating lay summaries may be related to concerns by regulatory agencies that summaries in plain language may introduce bias3. Future looks into integration of NLP systems to clinical trials may create unbiased summaries and allow for FDA regulation.


Author(s):  
Chieh-Li Chin ◽  
Wen-Yuh Su ◽  
Jessie Chin

While the virality of misinformation has been recognized as one of the significant global issues in the modern societies, few studies had examined the computational approaches to represent and identify false information in health domains. The current study aimed at using both psycholinguistic and natural language processing models to represent verified true and false texts about human papillomavirus (HPV) vaccines. Compared to the conventional word-embedding models representing texts in the levels of words, sentences or documents, results showed that introducing the embedding in the levels of propositions best differentiated the semantic representations in true and false texts. The study would advance our understandings in representing health texts and have implications on detecting false health information.


2020 ◽  
Author(s):  
Shogo Ujiie ◽  
Shuntaro Yada ◽  
Shoko Wakamiya ◽  
Eiji Aramaki

BACKGROUND Medical articles covering adverse drug events (ADEs) are systematically reported by pharmaceutical companies for drug safety information purposes. Although policies governing reporting to regulatory bodies vary among countries and regions, all medical article reporting may be categorized as precision or recall based. Recall-based reporting, which is implemented in Japan, requires the reporting of any possible ADE. Therefore, recall-based reporting can introduce numerous false negatives or substantial amounts of noise, a problem that is difficult to address using limited manual labor. OBJECTIVE Our aim was to develop an automated system that could identify ADE-related medical articles, support recall-based reporting, and alleviate manual labor in Japanese pharmaceutical companies. METHODS Using medical articles as input, our system based on natural language processing applies document-level classification to extract articles containing ADEs (replacing manual labor in the first screening) and sentence-level classification to extract sentences within those articles that imply ADEs (thus supporting experts in the second screening). We used 509 Japanese medical articles annotated by a medical engineer to evaluate the performance of the proposed system. RESULTS Document-level classification yielded an F1 of 0.903. Sentence-level classification yielded an F1 of 0.413. These were averages of fivefold cross-validations. CONCLUSIONS A simple automated system may alleviate the manual labor involved in screening drug safety–related medical articles in pharmaceutical companies. After improving the accuracy of the sentence-level classification by considering a wider context, we intend to apply this system toward real-world postmarketing surveillance.


2021 ◽  
Author(s):  
Joo Yun Lee

This study analyzed collected social media data from South Korea containing keywords related to “pregnancy” using ontology-based natural language processing. Of the 504,725 documents, those containing concepts related to “maternal emotion” were the most frequent, followed by “family support”. Social media were used as a means of exchanging information and expressing emotions.


2019 ◽  
Vol 20 (8) ◽  
pp. 1274-1283 ◽  
Author(s):  
Megan A. Moreno ◽  
Aubrey D. Gower ◽  
Heather Brittain ◽  
Tracy Vaillancourt

Sign in / Sign up

Export Citation Format

Share Document