Using natural language processing for identification of herpes zoster ophthalmicus cases to support population-based study

2018 ◽  
Vol 47 (1) ◽  
pp. 7-14 ◽  
Author(s):  
Chengyi Zheng ◽  
Yi Luo ◽  
Cheryl Mercado ◽  
Lina Sy ◽  
Steven J Jacobsen ◽  
...  
Author(s):  
Chengyi Zheng ◽  
Lina S Sy ◽  
Hilary Tanenbaum ◽  
Yun Tian ◽  
Yi Luo ◽  
...  

Abstract Background Diagnosis codes are inadequate for accurately identifying herpes zoster ophthalmicus. Manual review of medical records is expensive and time-consuming, resulting in a lack of population-based data on herpes zoster ophthalmicus. Methods We conducted a retrospective cohort study, including 87,673 patients aged ≥50 years who had a new HZ diagnosis and associated antiviral prescription between 2010-2018. We developed and validated an automated natural language processing (NLP) algorithm to identify herpes zoster ophthalmicus (HZO) with ocular involvement (ocular HZO). We compared the characteristics of NLP-identified ocular HZO, nonocular HZO, and non-HZO cases among HZ patients and identified the factors associated with ocular HZO among HZ patients. Results The NLP algorithm achieved 94.9% sensitivity and 94.2% specificity in identifying ocular HZO cases. Among 87,673 incident HZ cases, the proportion identified as ocular HZO was 9.0% (n=7,853) by NLP and 2.3% (n=1,988) by ICD codes. In adjusted analyses, older age and male sex were associated with an increased risk of ocular HZO; Hispanic and Black race/ethnicity each were associated with a lower risk of ocular HZO compared to non-Hispanic White. Conclusions The NLP algorithm achieved high accuracy and can be used in large population-based studies to identify ocular HZO, avoiding labor-intensive chart review. Age, sex, and race were strongly associated with ocular HZO among HZ patients. We should consider these risk factors when planning for zoster vaccination.


2020 ◽  
Vol 14 (Supplement_1) ◽  
pp. S309-S310
Author(s):  
R Stidham ◽  
D Yu ◽  
S Lahiri ◽  
V Vydiswaran

Abstract Background Extra-Intestinal Manifestations (EIM) occur in nearly 40% of patients with IBD and impact both disease experience and therapeutic decision-making, but are not well captured by administrative codes. We aimed to pilot computational natural language processing (NLP) methods to characterise EIMs using consultant notes. Methods Subjects with a diagnosis of IBD were identified in a single-centre retrospective review of electronic health records (EHR) between 2014–2017. Gastroenterology (GI) notes were annotated by two reviewers for the presence and activity of EIMs. EIM concepts were identified using NLP methods leveraging UMLS libraries and hand-crafted features. EIM characterisation occurred within a ±25-word window around identified EIMs with classifications including inactive concepts (negated, historical, resolved) and active concepts (improved, worsened, active but unchanged). Decisions on EIM status when repeatedly referenced in a document used section-based weighting for status inference, with greatest to least weight ranking for assessment/plan, subjective, past history, exam, and other, respectively. EIM status was classified as ambiguous when multiple conflicting references were present within the same document of approximately equal weight. Model development and testing used an 80/20 dataset split. Results In 4108 unique IBD patients, 1640 (39.9%) had at least 1 EIM identified. The mean age was 41.9 years, 47.2% were male, and 27.0% had biologic exposure. A total of 1240 manually annotated documents (first GI notes) were comprised of 51.1% arthritis, 16.5% ocular, 16.2% psoriasis, with erythema nodosum (EN), pyoderma gangrenosum (PG), and hidradenitis suppurativa (HS) together comprising 16.2% of the cohort. NLP models performed well for correctly classifying both EIM presence and status in a testing set, with overall accuracy, sensitivity, and specificity of 91.2%, 92.9% and 81.8% across all EIMs in notes automatically classified as non-ambiguous (Table 1). NLP methods identified EIM status classification as ambiguous in 38.9% of cases. Conclusion NLP methods can detect and classify EIMs with reasonable performance and efficiency compared with traditional manual chart review. Though source document variation and ambiguity present challenges, NLP offers exciting possibilities for population-based research and decision support.


2021 ◽  
Author(s):  
Chengyi Zheng ◽  
Jonathan Duffy ◽  
In-Lu Amy Liu ◽  
Lina S. Sy ◽  
Ronald A. Navarro ◽  
...  

Background: Shoulder injury related to vaccine administration (SIRVA) accounts for more than half of all claims received by the National Vaccine Injury Compensation Program. However, there is a lack of population-based studies due to the challenge of identifying SIRVA cases in large health care databases. Objective: To develop a natural language processing (NLP) method to identify SIRVA cases from clinical notes. Methods: We conducted the study among members of a large integrated health care organization who were vaccinated between 04/1/2016 and 12/31/2017 and had subsequent diagnosis codes indicative of shoulder injury. Based on a training dataset with a chart review reference standard of 164 individuals, we developed an NLP algorithm to extract shoulder disorder information, including prior vaccination, anatomic location, temporality and causality. The algorithm identified three groups of positive SIRVA cases (definite, probable and possible) based on the strength of evidence. We compared NLP results to a chart review reference standard of 100 vaccinated individuals. We then applied the final automated NLP algorithm to a broader cohort of vaccinated individuals with a shoulder injury diagnosis code and performed manual chart confirmation on a random sample of NLP-identified definite cases and all NLP-identified probable and possible cases. Results: In the validation sample, the NLP algorithm had 100% accuracy for identifying 4 SIRVA cases and 96 individuals without SIRVA. In the broader cohort of 53,585 individuals, the NLP algorithm identified 291 definite, 124 probable, and 52 possible SIRVA cases. The chart-confirmation rates for these groups were 95.3%, 67.7% and 18.9%, respectively. Conclusions: The algorithm performed with high sensitivity and reasonable specificity in identifying positive SIRVA cases. The NLP algorithm can potentially be used in future population-based studies to identify this rare adverse event, avoiding labor-intensive chart review validation.


2017 ◽  
Vol 12 (1) ◽  
pp. S1438 ◽  
Author(s):  
Bernardo Goulart ◽  
Emily Silgard ◽  
Christina Baik ◽  
Aasthaa Bansal ◽  
Mikael Greenwood-Hickman ◽  
...  

2018 ◽  
Vol 154 (1) ◽  
pp. 24 ◽  
Author(s):  
Jason P. Lott ◽  
Denise M. Boudreau ◽  
Ray L. Barnhill ◽  
Martin A. Weinstock ◽  
Eleanor Knopp ◽  
...  

2021 ◽  
Author(s):  
Chengyi Zheng ◽  
Jonathan Duffy ◽  
In-Lu Amy Liu ◽  
Lina S. Sy ◽  
Ronald A. Navarro ◽  
...  

BACKGROUND Shoulder injury related to vaccine administration (SIRVA) accounts for more than half of all claims received by the National Vaccine Injury Compensation Program. However, there is a lack of population-based studies due to the challenge of identifying SIRVA cases in large health care databases. OBJECTIVE To develop a natural language processing (NLP) method to identify SIRVA cases from clinical notes. METHODS We conducted the study among members of a large integrated health care organization who were vaccinated between 04/1/2016 and 12/31/2017 and had subsequent diagnosis codes indicative of shoulder injury. Based on a training dataset with a chart review reference standard of 164 individuals, we developed an NLP algorithm to extract shoulder disorder information, including prior vaccination, anatomic location, temporality and causality. The algorithm identified three groups of positive SIRVA cases (definite, probable and possible) based on the strength of evidence. We compared NLP results to a chart review reference standard of 100 vaccinated individuals. We then applied the final automated NLP algorithm to a broader cohort of vaccinated individuals with a shoulder injury diagnosis code and performed manual chart confirmation on a random sample of NLP-identified definite cases and all NLP-identified probable and possible cases. RESULTS In the validation sample, the NLP algorithm had 100% accuracy for identifying 4 SIRVA cases and 96 individuals without SIRVA. In the broader cohort of 53,585 individuals, the NLP algorithm identified 291 definite, 124 probable, and 52 possible SIRVA cases. The chart-confirmation rates for these groups were 95.3%, 67.7% and 18.9%, respectively. CONCLUSIONS The algorithm performed with high sensitivity and reasonable specificity in identifying positive SIRVA cases. The NLP algorithm can potentially be used in future population-based studies to identify this rare adverse event, avoiding labor-intensive chart review validation.


Sign in / Sign up

Export Citation Format

Share Document