Natural language processing to extract symptoms of severe mental illness from clinical text: the Clinical Record Interactive Search Comprehensive Data Extraction (CRIS-CODE) project

Richard G Jackson; Rashmi Patel; Nishamali Jayatilleke; Anna Kolliakou; Michael Ball; Genevieve Gorrell; Angus Roberts; Richard J Dobson; Robert Stewart

doi:10.1136/bmjopen-2016-012012

A deep database of medical abbreviations and acronyms for natural language processing

Scientific Data ◽

10.1038/s41597-021-00929-4 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Lisa Grossman Liu ◽

Raymond H. Grossman ◽

Elliot G. Mitchell ◽

Chunhua Weng ◽

Karthik Natarajan ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

American English ◽

Substantial Improvement ◽

Future Application ◽

Multiple Sources ◽

High Coverage ◽

Clinical Text ◽

Automated Quality Control

AbstractThe recognition, disambiguation, and expansion of medical abbreviations and acronyms is of upmost importance to prevent medically-dangerous misinterpretation in natural language processing. To support recognition, disambiguation, and expansion, we present the Medical Abbreviation and Acronym Meta-Inventory, a deep database of medical abbreviations. A systematic harmonization of eight source inventories across multiple healthcare specialties and settings identified 104,057 abbreviations with 170,426 corresponding senses. Automated cross-mapping of synonymous records using state-of-the-art machine learning reduced redundancy, which simplifies future application. Additional features include semi-automated quality control to remove errors. The Meta-Inventory demonstrated high completeness or coverage of abbreviations and senses in new clinical text, a substantial improvement over the next largest repository (6–14% increase in abbreviation coverage; 28–52% increase in sense coverage). To our knowledge, the Meta-Inventory is the most complete compilation of medical abbreviations and acronyms in American English to-date. The multiple sources and high coverage support application in varied specialties and settings. This allows for cross-institutional natural language processing, which previous inventories did not support. The Meta-Inventory is available at https://bit.ly/github-clinical-abbreviations.

Download Full-text

Systematic review of current natural language processing methods and applications in cardiology

Heart ◽

10.1136/heartjnl-2021-319769 ◽

2021 ◽

pp. heartjnl-2021-319769

Author(s):

Meghan Reading Turchioe ◽

Alexander Volodarskiy ◽

Jyotishman Pathak ◽

Drew N Wright ◽

James Enlou Tcheng ◽

...

Keyword(s):

Systematic Review ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Care ◽

Real World Data ◽

Clinical Text ◽

Clinical Notes ◽

Artery Disease ◽

Automated Methods

Natural language processing (NLP) is a set of automated methods to organise and evaluate the information contained in unstructured clinical notes, which are a rich source of real-world data from clinical care that may be used to improve outcomes and understanding of disease in cardiology. The purpose of this systematic review is to provide an understanding of NLP, review how it has been used to date within cardiology and illustrate the opportunities that this approach provides for both research and clinical care. We systematically searched six scholarly databases (ACM Digital Library, Arxiv, Embase, IEEE Explore, PubMed and Scopus) for studies published in 2015–2020 describing the development or application of NLP methods for clinical text focused on cardiac disease. Studies not published in English, lacking a description of NLP methods, non-cardiac focused and duplicates were excluded. Two independent reviewers extracted general study information, clinical details and NLP details and appraised quality using a checklist of quality indicators for NLP studies. We identified 37 studies developing and applying NLP in heart failure, imaging, coronary artery disease, electrophysiology, general cardiology and valvular heart disease. Most studies used NLP to identify patients with a specific diagnosis and extract disease severity using rule-based NLP methods. Some used NLP algorithms to predict clinical outcomes. A major limitation is the inability to aggregate findings across studies due to vastly different NLP methods, evaluation and reporting. This review reveals numerous opportunities for future NLP work in cardiology with more diverse patient samples, cardiac diseases, datasets, methods and applications.

Download Full-text

A Strategy for Deploying Secure Cloud-Based Natural Language Processing Systems for Applied Research Involving Clinical Text

2011 44th Hawaii International Conference on System Sciences ◽

10.1109/hicss.2011.32 ◽

2011 ◽

Cited By ~ 4

Author(s):

D Carrell

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Applied Research ◽

Clinical Text

Download Full-text

Natural Language Processing-Based Information Extraction and Abstraction for Lease Documents

Advances in Computer and Electrical Engineering - Neural Networks for Natural Language Processing ◽

10.4018/978-1-7998-1159-6.ch011 ◽

2020 ◽

pp. 170-187

Author(s):

Sumathi S. ◽

Rajkumar S. ◽

Indumathi S.

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Information Extraction ◽

Language Processing ◽

Data Extraction ◽

Easy Access ◽

Property A ◽

Key Events

Lease abstraction is the method of compartmentalization of key data from a lease document. Lease document for a property contains key business, money, and legal data about a property. A lease abstract report contains details concerning the property location and basic lease details, price schedules, key events, terms and conditions, automobile parking arrangements, and landowner and tenant obligations. Abstracting a true estate contract into electronic type facilitates easy access to key data, exchanging the tedious method of reading the whole contents of the contract every time. Language process may be used for data extraction and abstraction of knowledge from lease documents.

Download Full-text

Making Natural Language Processing More Accessible for Analysis of Clinical Text

SciVee ◽

10.4016/30705.01 ◽

2011 ◽

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Text

Download Full-text

Toward a clinical text encoder: pretraining for clinical natural language processing with applications to substance misuse

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocz072 ◽

2019 ◽

Vol 26 (11) ◽

pp. 1272-1278 ◽

Cited By ~ 2

Author(s):

Dmitriy Dligach ◽

Majid Afshar ◽

Timothy Miller

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Substance Misuse ◽

Research Direction ◽

Clinical Text ◽

Universal Properties ◽

Classification Tasks ◽

Performance Gains ◽

Clinical Natural Language Processing

Abstract Objective Our objective is to develop algorithms for encoding clinical text into representations that can be used for a variety of phenotyping tasks. Materials and Methods Obtaining large datasets to take advantage of highly expressive deep learning methods is difficult in clinical natural language processing (NLP). We address this difficulty by pretraining a clinical text encoder on billing code data, which is typically available in abundance. We explore several neural encoder architectures and deploy the text representations obtained from these encoders in the context of clinical text classification tasks. While our ultimate goal is learning a universal clinical text encoder, we also experiment with training a phenotype-specific encoder. A universal encoder would be more practical, but a phenotype-specific encoder could perform better for a specific task. Results We successfully train several clinical text encoders, establish a new state-of-the-art on comorbidity data, and observe good performance gains on substance misuse data. Discussion We find that pretraining using billing codes is a promising research direction. The representations generated by this type of pretraining have universal properties, as they are highly beneficial for many phenotyping tasks. Phenotype-specific pretraining is a viable route for trading the generality of the pretrained encoder for better performance on a specific phenotyping task. Conclusions We successfully applied our approach to many phenotyping tasks. We conclude by discussing potential limitations of our approach.

Download Full-text

CLAMP – a toolkit for efficiently building customized clinical natural language processing pipelines

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocx132 ◽

2017 ◽

Vol 25 (3) ◽

pp. 331-336 ◽

Cited By ~ 64

Author(s):

Ergin Soysal ◽

Jingqi Wang ◽

Min Jiang ◽

Yonghui Wu ◽

Serguei Pakhomov ◽

...

Keyword(s):

Natural Language Processing ◽

User Interface ◽

Natural Language ◽

Language Processing ◽

High Performance ◽

Smoking Status ◽

Entity Recognition ◽

Graphic User Interface ◽

Clinical Text ◽

Clinical Natural Language Processing

Abstract Existing general clinical natural language processing (NLP) systems such as MetaMap and Clinical Text Analysis and Knowledge Extraction System have been successfully applied to information extraction from clinical text. However, end users often have to customize existing systems for their individual tasks, which can require substantial NLP skills. Here we present CLAMP (Clinical Language Annotation, Modeling, and Processing), a newly developed clinical NLP toolkit that provides not only state-of-the-art NLP components, but also a user-friendly graphic user interface that can help users quickly build customized NLP pipelines for their individual applications. Our evaluation shows that the CLAMP default pipeline achieved good performance on named entity recognition and concept encoding. We also demonstrate the efficiency of the CLAMP graphic user interface in building customized, high-performance NLP pipelines with 2 use cases, extracting smoking status and lab test values. CLAMP is publicly available for research use, and we believe it is a unique asset for the clinical NLP community.

Download Full-text

A Natural Language Processing Tool for Large-Scale Data Extraction from Echocardiography Reports

PLoS ONE ◽

10.1371/journal.pone.0153749 ◽

2016 ◽

Vol 11 (4) ◽

pp. e0153749 ◽

Cited By ~ 20

Author(s):

Chinmoy Nath ◽

Mazen S. Albaghdadi ◽

Siddhartha R. Jonnalagadda

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Data Extraction ◽

Large Scale Data ◽

Natural Language Processing Tool ◽

Scale Data

Download Full-text

How Artificial Intelligence Can Improve Our Understanding of the Genes Associated with Endometriosis: Natural Language Processing of the PubMed Database

BioMed Research International ◽

10.1155/2018/6217812 ◽

2018 ◽

Vol 2018 ◽

pp. 1-7 ◽

Cited By ~ 7

Author(s):

J. Bouaziz ◽

R. Mashiach ◽

S. Cohen ◽

A. Kedem ◽

A. Baron ◽

...

Keyword(s):

Artificial Intelligence ◽

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Language Processing ◽

Data Extraction ◽

Endometrial Tissue ◽

Endometrial Cells ◽

Pubmed Database ◽

Using Data

Endometriosis is a disease characterized by the development of endometrial tissue outside the uterus, but its cause remains largely unknown. Numerous genes have been studied and proposed to help explain its pathogenesis. However, the large number of these candidate genes has made functional validation through experimental methodologies nearly impossible. Computational methods could provide a useful alternative for prioritizing those most likely to be susceptibility genes. Using artificial intelligence applied to text mining, this study analyzed the genes involved in the pathogenesis, development, and progression of endometriosis. The data extraction by text mining of the endometriosis-related genes in the PubMed database was based on natural language processing, and the data were filtered to remove false positives. Using data from the text mining and gene network information as input for the web-based tool, 15,207 endometriosis-related genes were ranked according to their score in the database. Characterization of the filtered gene set through gene ontology, pathway, and network analysis provided information about the numerous mechanisms hypothesized to be responsible for the establishment of ectopic endometrial tissue, as well as the migration, implantation, survival, and proliferation of ectopic endometrial cells. Finally, the human genome was scanned through various databases using filtered genes as a seed to determine novel genes that might also be involved in the pathogenesis of endometriosis but which have not yet been characterized. These genes could be promising candidates to serve as useful diagnostic biomarkers and therapeutic targets in the management of endometriosis.

Download Full-text

Repurposing the Clinical Record: Can an Existing Natural Language Processing System De-identify Clinical Notes?

Journal of the American Medical Informatics Association ◽

10.1197/jamia.m2862 ◽

2009 ◽

Vol 16 (1) ◽

pp. 37-39 ◽

Cited By ~ 20

Author(s):

F. P. Morrison ◽

L. Li ◽

A. M. Lai ◽

G. Hripcsak

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Processing System ◽

Clinical Record ◽

Clinical Notes ◽

Natural Language Processing System

Download Full-text