Extracting Critical Information from Free Text Data for Systems Health Management

Machine Learning and Knowledge Discovery for Engineering Systems Health Management ◽

10.1201/b11580-13 ◽

2016 ◽

pp. 423-449

Author(s):

Anne Kao ◽

Stephen Poteet ◽

David Augustine

Keyword(s):

Health Management ◽

Free Text ◽

Text Data ◽

Critical Information

Download Full-text

A Numerically Coded File of Operative Procedures Derived from a Free Text Data Collection System : A Measure of the Accuracy

Methods of Information in Medicine ◽

10.1055/s-0038-1635717 ◽

1976 ◽

Vol 15 (01) ◽

pp. 21-28 ◽

Cited By ~ 3

Author(s):

Carmen A. Scudiero ◽

Ruth L. Wong

Keyword(s):

Data Collection ◽

Pap Smear ◽

Operative Procedures ◽

Free Text ◽

Collection System ◽

Process Data ◽

Text Data ◽

Data Collection System ◽

History Of ◽

Correlation System

A free text data collection system has been developed at the University of Illinois utilizing single word, syntax free dictionary lookup to process data for retrieval. The source document for the system is the Surgical Pathology Request and Report form. To date 12,653 documents have been entered into the system.The free text data was used to create an IRS (Information Retrieval System) database. A program to interrogate this database has been developed to numerically coded operative procedures. A total of 16,519 procedures records were generated. One and nine tenths percent of the procedures could not be fitted into any procedures category; 6.1% could not be specifically coded, while 92% were coded into specific categories. A system of PL/1 programs has been developed to facilitate manual editing of these records, which can be performed in a reasonable length of time (1 week). This manual check reveals that these 92% were coded with precision = 0.931 and recall = 0.924. Correction of the readily correctable errors could improve these figures to precision = 0.977 and recall = 0.987. Syntax errors were relatively unimportant in the overall coding process, but did introduce significant error in some categories, such as when right-left-bilateral distinction was attempted.The coded file that has been constructed will be used as an input file to a gynecological disease/PAP smear correlation system. The outputs of this system will include retrospective information on the natural history of selected diseases and a patient log providing information to the clinician on patient follow-up.Thus a free text data collection system can be utilized to produce numerically coded files of reasonable accuracy. Further, these files can be used as a source of useful information both for the clinician and for the medical researcher.

Download Full-text

Predicting adult neuroscience intensive care unit admission from emergency department triage using a retrospective, tabular-free text machine learning approach

Scientific Reports ◽

10.1038/s41598-021-80985-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Eyal Klang ◽

Benjamin R. Kummer ◽

Neha S. Dangayach ◽

Amy Zhong ◽

M. Arash Kia ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care Unit ◽

Emergency Department ◽

Intensive Care ◽

Learning Model ◽

Free Text ◽

Combined Model ◽

Text Data ◽

Machine Learning Model ◽

Record Data

AbstractEarly admission to the neurosciences intensive care unit (NSICU) is associated with improved patient outcomes. Natural language processing offers new possibilities for mining free text in electronic health record data. We sought to develop a machine learning model using both tabular and free text data to identify patients requiring NSICU admission shortly after arrival to the emergency department (ED). We conducted a single-center, retrospective cohort study of adult patients at the Mount Sinai Hospital, an academic medical center in New York City. All patients presenting to our institutional ED between January 2014 and December 2018 were included. Structured (tabular) demographic, clinical, bed movement record data, and free text data from triage notes were extracted from our institutional data warehouse. A machine learning model was trained to predict likelihood of NSICU admission at 30 min from arrival to the ED. We identified 412,858 patients presenting to the ED over the study period, of whom 1900 (0.5%) were admitted to the NSICU. The daily median number of ED presentations was 231 (IQR 200–256) and the median time from ED presentation to the decision for NSICU admission was 169 min (IQR 80–324). A model trained only with text data had an area under the receiver-operating curve (AUC) of 0.90 (95% confidence interval (CI) 0.87–0.91). A structured data-only model had an AUC of 0.92 (95% CI 0.91–0.94). A combined model trained on structured and text data had an AUC of 0.93 (95% CI 0.92–0.95). At a false positive rate of 1:100 (99% specificity), the combined model was 58% sensitive for identifying NSICU admission. A machine learning model using structured and free text data can predict NSICU admission soon after ED arrival. This may potentially improve ED and NSICU resource allocation. Further studies should validate our findings.

Download Full-text

What are the concerns and goals of women attending a urogynaecology clinic? Content analysis of free-text data from an electronic pelvic floor assessment questionnaire (ePAQ-PF)

International Urogynecology Journal ◽

10.1007/s00192-018-3697-0 ◽

2018 ◽

Vol 30 (1) ◽

pp. 33-41 ◽

Cited By ~ 3

Author(s):

Thomas Gray ◽

Scarlett Strickland ◽

Sarita Pooranawattanakul ◽

Weiguang Li ◽

Patrick Campbell ◽

...

Keyword(s):

Content Analysis ◽

Pelvic Floor ◽

Free Text ◽

Text Data ◽

Assessment Questionnaire

Download Full-text

An exploration of text mining of narrative reports of injury incidents to assess risk

MATEC Web of Conferences ◽

10.1051/matecconf/201825106020 ◽

2018 ◽

Vol 251 ◽

pp. 06020 ◽

Cited By ~ 4

Author(s):

David Passmore ◽

Chungil Chae ◽

Yulia Kustikova ◽

Rose Baker ◽

Jeong-Ha Yim

Keyword(s):

Latent Dirichlet Allocation ◽

Topic Model ◽

Surface Mining ◽

Modeling Processes ◽

Free Text ◽

Text Data ◽

Injury Occurrence ◽

The Usa ◽

Musculoskeletal Systems ◽

Topic Mining

A topic model was explored using unsupervised machine learning to summarized free-text narrative reports of 77,215 injuries that occurred in coal mines in the USA between 2000 and 2015. Latent Dirichlet Allocation modeling processes identified six topics from the free-text data. One topic, a theme describing primarily injury incidents resulting in strains and sprains of musculoskeletal systems, revealed differences in topic emphasis by the location of the mine property at which injuries occurred, the degree of injury, and the year of injury occurrence. Text narratives clustered around this topic refer most frequently to surface or other locations rather than underground locations that resulted in disability and that, also, increased secularly over time. The modeling success enjoyed in this exploratory effort suggests that additional topic mining of these injury text narratives is justified, especially using a broad set of covariates to explain variations in topic emphasis and for comparison of surface mining injuries with injuries occurring during site preparation for construction.

Download Full-text

Use of Electronic Health Record Tools to Facilitate and Audit Infliximab Prescribing

The Journal of Pediatric Pharmacology and Therapeutics ◽

10.5863/1551-6776-23.1.18 ◽

2018 ◽

Vol 23 (1) ◽

pp. 18-25

Author(s):

Bethany R. Sharpless ◽

Fernando del Rosario ◽

Zarela Molle-Rios ◽

Elora Hilmas

Keyword(s):

Literature Review ◽

Electronic Health Record ◽

National Survey ◽

Free Text ◽

Health Record ◽

Order Information ◽

Text Data ◽

Review Analysis ◽

Electronic Health ◽

Implementation Data

OBJECTIVES The objective of this project was to assess a pediatric institution's use of infliximab and develop and evaluate electronic health record tools to improve safety and efficiency of infliximab ordering through auditing and improved communication. METHODS Best use of infliximab was defined through a literature review, analysis of baseline use of infliximab at our institution, and distribution and analysis of a national survey. Auditing and order communication were optimized through implementation of mandatory indications in the infliximab orderable and creation of an interactive flowsheet that collects discrete and free-text data. The value of the implemented electronic health record tools was assessed at the conclusion of the project. RESULTS Baseline analysis determined that 93.8% of orders were dosed appropriately according to the findings of a literature review. After implementation of the flowsheet and indications, the time to perform an audit of use was reduced from 60 minutes to 5 minutes per month. Four months post implementation, data were entered by 60% of the pediatric gastroenterologists at our institution on 15.3% of all encounters for infliximab. Users were surveyed on the value of the tools, with 100% planning to continue using the workflow, and 82% stating the tools frequently improve the efficiency and safety of infliximab prescribing. CONCLUSIONS Creation of a standard workflow by using an interactive flowsheet has improved auditing ability and facilitated the communication of important order information surrounding infliximab. Providers and pharmacists feel these tools improve the safety and efficiency of infliximab ordering, and auditing data reveal that the tools are being used.

Download Full-text

Identifying Medication-related Intents from a Bidirectional Text Messaging Platform for Hypertension Management: An Unsupervised Learning Approach

10.1101/2021.12.23.21268061 ◽

2021 ◽

Author(s):

Anahita Davoudi ◽

Natalie Lee ◽

Thaibinh Luong ◽

Timothy Delaney ◽

Elizabeth Asch ◽

...

Keyword(s):

Blood Pressure ◽

Unsupervised Learning ◽

Language Processing ◽

Text Messaging ◽

Latent Dirichlet Allocation ◽

Clinical Care ◽

Hypertension Management ◽

Free Text ◽

Significant Heterogeneity ◽

Text Data

Background: Free-text communication between patients and providers is playing an increasing role in chronic disease management, through platforms varying from traditional healthcare portals to more novel mobile messaging applications. These text data are rich resources for clinical and research purposes, but their sheer volume render them difficult to manage. Even automated approaches such as natural language processing require labor-intensive manual classification for developing training datasets, which is a rate-limiting step. Automated approaches to organizing free-text data are necessary to facilitate the use of free-text communication for clinical care and research. Objective: We applied unsupervised learning approaches to 1) understand the types of topics discussed and 2) to learn medication-related intents from messages sent between patients and providers through a bi-directional text messaging system for managing participant blood pressure. Methods: This study was a secondary analysis of de-identified messages from a remote mobile text-based employee hypertension management program at an academic institution. In experiment 1, we trained a Latent Dirichlet Allocation (LDA) model for each message type (inbound-patient and outbound-provider) and identified the distribution of major topics and significant topics (probability >0.20) across message types. In experiment 2, we annotated all medication-related messages with a single medication intent. Then, we trained a second LDA model (medLDA) to assess how well the unsupervised method could identify more fine-grained medication intents. We encoded each medication message with n-grams (n-1-3 words) using spaCy, clinical named entities using STANZA, and medication categories using MedEx, and then applied Chi-square feature selection to learn the most informative features associated with each medication intent. Results: A total of 253 participants and 5 providers engaged in the program generating 12,131 total messages: 47% patient messages and 53% provider messages. Most patient messages correspond to blood pressure (BP) reporting, BP encouragement, and appointment scheduling. In contrast, most provider messages correspond to BP reporting, medication adherence, and confirmatory statements. In experiment 1, for both patient and provider messages, most messages contained 1 topic and few with more than 3 topics identified using LDA. However, manual review of some messages within topics revealed significant heterogeneity even within single-topic messages as identified by LDA. In experiment 2, among the 534 medication messages annotated with a single medication intent, most of the 282 patient medication messages referred to medication request (48%; n=134) and medication taking (28%; n=79); most of the 252 provider medication messages referred to medication question (69%; n=173). Although medLDA could identify a majority intent within each topic, the model could not distinguish medication intents with low prevalence within either patient or provider messages. Richer feature engineering identified informative lexical-semantic patterns associated with each medication intent class. Conclusion: LDA can be an effective method for generating subgroups of messages with similar term usage and facilitate the review of topics to inform annotations. However, few training cases and shared vocabulary between intents precludes the use of LDA for fully automated deep medication intent classification.

Download Full-text

Unsupervised identification of crime problems from police free-text data

Crime Science ◽

10.1186/s40163-020-00127-4 ◽

2020 ◽

Vol 9 (1) ◽

Author(s):

Daniel Birks ◽

Alex Coleman ◽

David Jackson

Keyword(s):

Crime Reduction ◽

Free Text ◽

Topic Modelling ◽

Modus Operandi ◽

Text Data ◽

Machine Learning Methods ◽

Major Metropolitan Area ◽

Operational Decision Making ◽

The Uk ◽

Crime Classification

Abstract We present a novel exploratory application of unsupervised machine-learning methods to identify clusters of specific crime problems from unstructured modus operandi free-text data within a single administrative crime classification. To illustrate our proposed approach, we analyse police recorded free-text narrative descriptions of residential burglaries occurring over a two-year period in a major metropolitan area of the UK. Results of our analyses demonstrate that topic modelling algorithms are capable of clustering substantively different burglary problems without prior knowledge of such groupings. Subsequently, we describe a prototype dashboard that allows replication of our analytical workflow and could be applied to support operational decision making in the identification of specific crime problems. This approach to grouping distinct types of offences within existing offence categories, we argue, has the potential to support crime analysts in proactively analysing large volumes of modus operandi free-text data—with the ultimate aims of developing a greater understanding of crime problems and supporting the design of tailored crime reduction interventions.

Download Full-text

Incorporating natural language processing to improve classification of axial spondyloarthritis using electronic health records

Rheumatology ◽

10.1093/rheumatology/kez375 ◽

2019 ◽

Vol 59 (5) ◽

pp. 1059-1065 ◽

Cited By ~ 1

Author(s):

Sizheng Steven Zhao ◽

Chuan Hong ◽

Tianrun Cai ◽

Chang Xu ◽

Jie Huang ◽

...

Keyword(s):

Electronic Health Records ◽

Predictive Value ◽

Area Under The Curve ◽

Free Text ◽

Text Data ◽

Health Records ◽

Disease Concepts ◽

Icd Codes ◽

Electronic Health

Abstract Objectives To develop classification algorithms that accurately identify axial SpA (axSpA) patients in electronic health records, and compare the performance of algorithms incorporating free-text data against approaches using only International Classification of Diseases (ICD) codes. Methods An enriched cohort of 7853 eligible patients was created from electronic health records of two large hospitals using automated searches (⩾1 ICD codes combined with simple text searches). Key disease concepts from free-text data were extracted using NLP and combined with ICD codes to develop algorithms. We created both supervised regression-based algorithms—on a training set of 127 axSpA cases and 423 non-cases—and unsupervised algorithms to identify patients with high probability of having axSpA from the enriched cohort. Their performance was compared against classifications using ICD codes only. Results NLP extracted four disease concepts of high predictive value: ankylosing spondylitis, sacroiliitis, HLA-B27 and spondylitis. The unsupervised algorithm, incorporating both the NLP concept and ICD code for AS, identified the greatest number of patients. By setting the probability threshold to attain 80% positive predictive value, it identified 1509 axSpA patients (mean age 53 years, 71% male). Sensitivity was 0.78, specificity 0.94 and area under the curve 0.93. The two supervised algorithms performed similarly but identified fewer patients. All three outperformed traditional approaches using ICD codes alone (area under the curve 0.80–0.87). Conclusion Algorithms incorporating free-text data can accurately identify axSpA patients in electronic health records. Large cohorts identified using these novel methods offer exciting opportunities for future clinical research.

Download Full-text

Applying lexical and semantic analysis to the exploration of free-text data

Nurse Researcher ◽

10.7748/nr.4.3.46.s5 ◽

1997 ◽

Vol 4 (3) ◽

pp. 46-68

Author(s):

LG Moseley ◽

FA Murphy

Keyword(s):

Semantic Analysis ◽

Free Text ◽

Text Data

Download Full-text