Correction: Using Natural Language Processing to Examine the Uptake, Content, and Readability of Media Coverage of a Pan-Canadian Drug Safety Research Project: Cross-Sectional Observational Study (Preprint)

Correction: Using Natural Language Processing to Examine the Uptake, Content, and Readability of Media Coverage of a Pan-Canadian Drug Safety Research Project: Cross-Sectional Observational Study

JMIR Formative Research ◽

10.2196/20211 ◽

2020 ◽

Vol 4 (6) ◽

pp. e20211

Author(s):

Hossein Mohammadhassanzadeh ◽

Ingrid Sketris ◽

Robyn Traynor ◽

Susan Alexander ◽

Brandace Winquist ◽

...

Keyword(s):

Natural Language Processing ◽

Observational Study ◽

Natural Language ◽

Language Processing ◽

Drug Safety ◽

Media Coverage ◽

Research Project ◽

Cross Sectional ◽

Safety Research

Download Full-text

Using Natural Language Processing to Examine the Uptake, Content, and Readability of Media Coverage of a Pan-Canadian Drug Safety Research Project: Cross-Sectional Observational Study (Preprint)

10.2196/preprints.13296 ◽

2019 ◽

Author(s):

Hossein Mohammadhassanzadeh ◽

Ingrid Sketris ◽

Robyn Traynor ◽

Susan Alexander ◽

Brandace Winquist ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Health Information ◽

Language Processing ◽

Drug Safety ◽

Media Coverage ◽

Reading Level ◽

Safety Communication ◽

Safety Research ◽

The Media

BACKGROUND Isotretinoin, for treating cystic acne, increases the risk of miscarriage and fetal abnormalities when taken during pregnancy. The Health Canada–approved product monograph for isotretinoin includes pregnancy prevention guidelines. A recent study by the Canadian Network for Observational Drug Effect Studies (CNODES) on the occurrence of pregnancy and pregnancy outcomes during isotretinoin therapy estimated poor adherence to these guidelines. Media uptake of this study was unknown; awareness of this uptake could help improve drug safety communication. OBJECTIVE The aim of this study was to understand how the media present pharmacoepidemiological research using the CNODES isotretinoin study as a case study. METHODS Google News was searched (April 25-May 6, 2016), using a predefined set of terms, for mention of the CNODES study. In total, 26 articles and 3 CNODES publications (original article, press release, and podcast) were identified. The article texts were cleaned (eg, advertisements and links removed), and the podcast was transcribed. A dictionary of 1295 unique words was created using natural language processing (NLP) techniques (term frequency-inverse document frequency, Porter stemming, and stop-word filtering) to identify common words and phrases. Similarity between the articles and reference publications was calculated using Euclidian distance; articles were grouped using hierarchical agglomerative clustering. Nine readability scales were applied to measure text readability based on factors such as number of words, difficult words, syllables, sentence counts, and other textual metrics. RESULTS The top 5 dictionary words were <italic>pregnancy</italic> (250 appearances), <italic>isotretinoin</italic> (220), <italic>study</italic> (209), <italic>drug</italic> (201), and <italic>women</italic> (185). Three distinct clusters were identified: Clusters 2 (5 articles) and 3 (4 articles) were from health-related websites and media, respectively; Cluster 1 (18 articles) contained largely media sources; 2 articles fell outside these clusters. Use of the term <italic>isotretinoin</italic> versus <italic>Accutane</italic> (a brand name of isotretinoin), discussion of pregnancy complications, and assignment of responsibility for guideline adherence varied between clusters. For example, the term <italic>pregnanc</italic> appeared most often in Clusters 1 (14.6 average times per article) and 2 (11.4) and relatively infrequently in Cluster 3 (1.8). Average readability for all articles was high (eg, Flesch-Kincaid, 13; Gunning Fog, 15; SMOG Index, 10; Coleman Liau Index, 15; Linsear Write Index, 13; and Text Standard, 13). Readability increased from Cluster 2 (Gunning Fog of 16.9) to 3 (12.2). It varied between clusters (average 13th-15th grade) but exceeded the recommended health information reading level (grade 6th to 8th), overall. CONCLUSIONS Media interpretation of the CNODES study varied, with differences in synonym usage and areas of focus. All articles were written above the recommended health information reading level. Analyzing media using NLP techniques can help determine drug safety communication effectiveness. This project is important for understanding how drug safety studies are taken up and redistributed in the media.

Download Full-text

Using Natural Language Processing to Examine the Uptake, Content, and Readability of Media Coverage of a Pan-Canadian Drug Safety Research Project: Cross-Sectional Observational Study

JMIR Formative Research ◽

10.2196/13296 ◽

2020 ◽

Vol 4 (1) ◽

pp. e13296

Author(s):

Hossein Mohammadhassanzadeh ◽

Ingrid Sketris ◽

Robyn Traynor ◽

Susan Alexander ◽

Brandace Winquist ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Health Information ◽

Language Processing ◽

Drug Safety ◽

Media Coverage ◽

Reading Level ◽

Safety Communication ◽

Safety Research ◽

The Media

Background Isotretinoin, for treating cystic acne, increases the risk of miscarriage and fetal abnormalities when taken during pregnancy. The Health Canada–approved product monograph for isotretinoin includes pregnancy prevention guidelines. A recent study by the Canadian Network for Observational Drug Effect Studies (CNODES) on the occurrence of pregnancy and pregnancy outcomes during isotretinoin therapy estimated poor adherence to these guidelines. Media uptake of this study was unknown; awareness of this uptake could help improve drug safety communication. Objective The aim of this study was to understand how the media present pharmacoepidemiological research using the CNODES isotretinoin study as a case study. Methods Google News was searched (April 25-May 6, 2016), using a predefined set of terms, for mention of the CNODES study. In total, 26 articles and 3 CNODES publications (original article, press release, and podcast) were identified. The article texts were cleaned (eg, advertisements and links removed), and the podcast was transcribed. A dictionary of 1295 unique words was created using natural language processing (NLP) techniques (term frequency-inverse document frequency, Porter stemming, and stop-word filtering) to identify common words and phrases. Similarity between the articles and reference publications was calculated using Euclidian distance; articles were grouped using hierarchical agglomerative clustering. Nine readability scales were applied to measure text readability based on factors such as number of words, difficult words, syllables, sentence counts, and other textual metrics. Results The top 5 dictionary words were pregnancy (250 appearances), isotretinoin (220), study (209), drug (201), and women (185). Three distinct clusters were identified: Clusters 2 (5 articles) and 3 (4 articles) were from health-related websites and media, respectively; Cluster 1 (18 articles) contained largely media sources; 2 articles fell outside these clusters. Use of the term isotretinoin versus Accutane (a brand name of isotretinoin), discussion of pregnancy complications, and assignment of responsibility for guideline adherence varied between clusters. For example, the term pregnanc appeared most often in Clusters 1 (14.6 average times per article) and 2 (11.4) and relatively infrequently in Cluster 3 (1.8). Average readability for all articles was high (eg, Flesch-Kincaid, 13; Gunning Fog, 15; SMOG Index, 10; Coleman Liau Index, 15; Linsear Write Index, 13; and Text Standard, 13). Readability increased from Cluster 2 (Gunning Fog of 16.9) to 3 (12.2). It varied between clusters (average 13th-15th grade) but exceeded the recommended health information reading level (grade 6th to 8th), overall. Conclusions Media interpretation of the CNODES study varied, with differences in synonym usage and areas of focus. All articles were written above the recommended health information reading level. Analyzing media using NLP techniques can help determine drug safety communication effectiveness. This project is important for understanding how drug safety studies are taken up and redistributed in the media.

Download Full-text

Correction to: Qualitative Assessment of Adult Patients’ Perception of Atopic Dermatitis Using Natural Language Processing Analysis in a Cross-Sectional Study

Dermatology and Therapy ◽

10.1007/s13555-020-00362-2 ◽

2020 ◽

Vol 10 (2) ◽

pp. 307-310

Author(s):

Bruno Falissard ◽

Eric L. Simpson ◽

Emma Guttman-Yassky ◽

Kim A. Papp ◽

Sebastien Barbarot ◽

...

Keyword(s):

Atopic Dermatitis ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Cross Sectional Study ◽

Adult Patients ◽

Qualitative Assessment ◽

Sectional Study ◽

Cross Sectional

Download Full-text

Extracting Clinical Features From Dictated Ambulatory Consult Notes Using a Commercially Available Natural Language Processing Tool: Pilot, Retrospective, Cross-Sectional Validation Study (Preprint)

10.2196/preprints.12575 ◽

2018 ◽

Author(s):

Jeremy Petch ◽

Jane Batt ◽

Joshua Murray ◽

Muhammad Mamdani

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Clinical Features ◽

Language Processing ◽

A Priori ◽

Cross Sectional Study ◽

Free Text ◽

Cross Sectional ◽

Text Data ◽

Complex Features

BACKGROUND The increasing adoption of electronic health records (EHRs) in clinical practice holds the promise of improving care and advancing research by serving as a rich source of data, but most EHRs allow clinicians to enter data in a text format without much structure. Natural language processing (NLP) may reduce reliance on manual abstraction of these text data by extracting clinical features directly from unstructured clinical digital text data and converting them into structured data. OBJECTIVE This study aimed to assess the performance of a commercially available NLP tool for extracting clinical features from free-text consult notes. METHODS We conducted a pilot, retrospective, cross-sectional study of the accuracy of NLP from dictated consult notes from our tuberculosis clinic with manual chart abstraction as the reference standard. Consult notes for 130 patients were extracted and processed using NLP. We extracted 15 clinical features from these consult notes and grouped them a priori into categories of simple, moderate, and complex for analysis. RESULTS For the primary outcome of overall accuracy, NLP performed best for features classified as simple, achieving an overall accuracy of 96% (95% CI 94.3-97.6). Performance was slightly lower for features of moderate clinical and linguistic complexity at 93% (95% CI 91.1-94.4), and lowest for complex features at 91% (95% CI 87.3-93.1). CONCLUSIONS The findings of this study support the use of NLP for extracting clinical features from dictated consult notes in the setting of a tuberculosis clinic. Further research is needed to fully establish the validity of NLP for this and other purposes.

Download Full-text

Comparison of Natural Language Processing and Manual Coding for the Identification of Cross-Sectional Imaging Reports Suspicious for Lung Cancer

JCO Clinical Cancer Informatics ◽

10.1200/cci.17.00069 ◽

2018 ◽

pp. 1-7 ◽

Cited By ~ 3

Author(s):

Roxanne Wadia ◽

Kathleen Akgun ◽

Cynthia Brandt ◽

Brenda T. Fenton ◽

Woody Levin ◽

...

Keyword(s):

Lung Cancer ◽

Natural Language Processing ◽

Natural Language ◽

Negative Predictive Value ◽

Language Processing ◽

Predictive Value ◽

Cross Sectional ◽

Predictive Values ◽

Clinical Text

Purpose To compare the accuracy and reliability of a natural language processing (NLP) algorithm with manual coding by radiologists, and the combination of the two methods, for the identification of patients whose computed tomography (CT) reports raised the concern for lung cancer. Methods An NLP algorithm was developed using Clinical Text Analysis and Knowledge Extraction System (cTAKES) with the Yale cTAKES Extensions and trained to differentiate between language indicating benign lesions and lesions concerning for lung cancer. A random sample of 450 chest CT reports performed at Veterans Affairs Connecticut Healthcare System between January 2014 and July 2015 was selected. A reference standard was created by the manual review of reports to determine if the text stated that follow-up was needed for concern for cancer. The NLP algorithm was applied to all reports and compared with case identification using the manual coding by the radiologists. Results A total of 450 reports representing 428 patients were analyzed. NLP had higher sensitivity and lower specificity than manual coding (77.3% v 51.5% and 72.5% v 82.5%, respectively). NLP and manual coding had similar positive predictive values (88.4% v 88.9%), and NLP had a higher negative predictive value than manual coding (54% v 38.5%). When NLP and manual coding were combined, sensitivity increased to 92.3%, with a decrease in specificity to 62.85%. Combined NLP and manual coding had a positive predictive value of 87.0% and a negative predictive value of 75.2%. Conclusion Our NLP algorithm was more sensitive than manual coding of CT chest reports for the identification of patients who required follow-up for suspicion of lung cancer. The combination of NLP and manual coding is a sensitive way to identify patients who need further workup for lung cancer.

Download Full-text

Electronic Interpretation of Chest Radiograph Reports to Detect Central Venous Catheters

Infection Control and Hospital Epidemiology ◽

10.1086/502165 ◽

2003 ◽

Vol 24 (12) ◽

pp. 950-954 ◽

Cited By ~ 17

Author(s):

William E. Trick ◽

Wendy W. Chapman ◽

Mary F. Wisniewski ◽

Brian J. Peterson ◽

Steven L. Solomon ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Chest Radiograph ◽

Insertion Site ◽

Processing System ◽

Central Venous ◽

Cross Sectional ◽

Natural Language Processing System ◽

Human Interpretation

AbstractObjective:To evaluate whether a natural language processing system, SymText, was comparable to human interpretation of chest radiograph reports for identifying the mention of a central venous catheter (CVC), and whether use of SymText could detect patients who had a CVC.Design:To identify patients who had a CVC, we performed two surveys of hospitalized patients. Then, we obtained available reports from 104 patients who had a CVC during one of two cross-sectional surveys (ie, case-patients) and 104 randomly selected patients who did not have a CVC (ie, control-patients).Setting:A 600-bed public teaching hospital.Results:Chest radiograph reports were available from 124 of the 208 participants. Compared with human interpretation, SymText had a sensitivity of 95.8% and a specificity of 98.7%. The use of SymText to identify case- and control-patients resulted in a sensitivity of 43% and a specificity of 98%. Successful application of SymText varied significantly by venous insertion site (eg, a sensitivity of 78% for subclavian and a sensitivity of 3.7% for femoral). Twenty-six percent of the case-patients had a femoral CVC.Conclusions:Compared with human interpretation, SymText performed well in interpreting whether a report mentioned a CVC. In patient populations with less frequent CVC placement in femoral veins, the sensitivity for CVC detection likely would be higher. Applying a natural language processing system to chest radiograph reports may be a useful adjunct to other data sources to automate detection of patients who had a CVC.

Download Full-text

Identification of Adverse Drug Event–Related Japanese Articles: Natural Language Processing Analysis (Preprint)

10.2196/preprints.22661 ◽

2020 ◽

Author(s):

Shogo Ujiie ◽

Shuntaro Yada ◽

Shoko Wakamiya ◽

Eiji Aramaki

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Drug Safety ◽

Automated System ◽

Pharmaceutical Companies ◽

Manual Labor ◽

Sentence Level ◽

Medical Articles ◽

Document Level

BACKGROUND Medical articles covering adverse drug events (ADEs) are systematically reported by pharmaceutical companies for drug safety information purposes. Although policies governing reporting to regulatory bodies vary among countries and regions, all medical article reporting may be categorized as precision or recall based. Recall-based reporting, which is implemented in Japan, requires the reporting of any possible ADE. Therefore, recall-based reporting can introduce numerous false negatives or substantial amounts of noise, a problem that is difficult to address using limited manual labor. OBJECTIVE Our aim was to develop an automated system that could identify ADE-related medical articles, support recall-based reporting, and alleviate manual labor in Japanese pharmaceutical companies. METHODS Using medical articles as input, our system based on natural language processing applies document-level classification to extract articles containing ADEs (replacing manual labor in the first screening) and sentence-level classification to extract sentences within those articles that imply ADEs (thus supporting experts in the second screening). We used 509 Japanese medical articles annotated by a medical engineer to evaluate the performance of the proposed system. RESULTS Document-level classification yielded an F1 of 0.903. Sentence-level classification yielded an F1 of 0.413. These were averages of fivefold cross-validations. CONCLUSIONS A simple automated system may alleviate the manual labor involved in screening drug safety–related medical articles in pharmaceutical companies. After improving the accuracy of the sentence-level classification by considering a wider context, we intend to apply this system toward real-world postmarketing surveillance.

Download Full-text

Applying Natural Language Processing to Evaluate News Media Coverage of Bullying and Cyberbullying

Prevention Science ◽

10.1007/s11121-019-01029-x ◽

2019 ◽

Vol 20 (8) ◽

pp. 1274-1283 ◽

Cited By ~ 3

Author(s):

Megan A. Moreno ◽

Aubrey D. Gower ◽

Heather Brittain ◽

Tracy Vaillancourt

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

News Media ◽

Media Coverage

Download Full-text

A natural language processing approach to modelling treatment alliance in psychotherapy transcripts

BJPsych Open ◽

10.1192/bjo.2021.177 ◽

2021 ◽

Vol 7 (S1) ◽

pp. S48-S48

Author(s):

Jihan Ryu ◽

Stephen Heisig ◽

Caroline McLaughlin ◽

Rebeccah Bortz ◽

Michael Katz ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Short Form ◽

Therapeutic Process ◽

Real Life ◽

Critical Factor ◽

Duration Of Treatment ◽

Cross Sectional ◽

Treatment Alliance

AimsPatient-therapist alliance is a critical factor in psychotherapy treatment outcomes. This pilot will identify language concepts in psychotherapy transcripts correlating with the valence of treatment alliance using natural language processing tools. Specifically, high-order linguistic features will be extracted through exploratory analysis of texts and interpreted for their power to discriminate alliance rated by patients.MethodAdult patients and therapists in outpatient clinic at various stages of relationship building and treatment goals consented to participate in the cross-sectional study approved by the Institutional Board Review. Psychotherapy sessions were recorded using wireless microphones and transcribed by two research assistants. After the recording, each patient completed Working Alliance Inventory– Short Form, to generate clinical scores of alliance. We used the Linguistic Inquiry Word Count (LIWC) tool to map words to psycholinguistic categories, and generated novel linguistic parameters describing the individual language for each speaker role. Canonical-correlational analysis and descriptive statistics were used to analyze the two datasets.ResultPatients (N = 12, 83% female, mean age = 40) were primarily diagnosed with personality disorders (67%) working on real-life interpersonal issues (median treatment duration 18.5 weeks, 50% psychodynamic, 32% cognitive-behavioral, 16% supportive modality). In this heterogenous sample, patients who used the “achieve” (e.g. trying, better, success, failure) and “swear” psycholinguistic categories of words rated the treatment alliance lower (r=−0.70, p = 0.01; r=−0.65, p = 0.02). Patients rated alliance lower with therapists, who used more “I” pronoun (r=−0.58, p < 0.05) and higher with therapists using more “risk” (difficult, safe, crisis) and “power” (important, strong, inferior, passive) categories (r = 0.66, p = 0.02, r = 0.58, p < 0.05), which commonly appeared in psychoeducation and conceptual framing of problems. Interestingly, there was no correlation with “affiliation” category (p = 0.9). Linear regression modeling from “achieve,” “swear” variables and “I,” “risk” variables with duration of treatment as covariate predicted the patient's rating of alliance (Adjusted R2 = 0.66, p = 0.03).ConclusionOur data collection and sub-sample analysis are ongoing. Preliminary results are showing speaker-specific language patterns in cognitive-emotional domain, e.g. self-expressivity, and in clinician's therapy style, covarying with the patient's perceived closeness in the heterogenous treatment dyads. Novel application of natural language processing to characterize alliance using the data-driven approach is an unbiased method that can provide feedback to clinicians and patients. This characterization can also potentially provide insights into the mechanisms underlying the therapeutic process and help develop psycholinguistic markers for this critical clinical phenomena.

Download Full-text