Automated Trait Extraction using ClearEarth, a Natural Language Processing System for Text Mining in Natural Sciences

Biodiversity Information Science and Standards ◽

10.3897/biss.2.26080 ◽

2018 ◽

Vol 2 ◽

pp. e26080 ◽

Cited By ~ 1

Author(s):

Anne Thessen ◽

Jenette Preciado ◽

Payoj Jain ◽

James Martin ◽

Martha Palmer ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Processing System ◽

Natural Sciences ◽

Phenotypic Traits ◽

Good Precision ◽

Named Entities ◽

Clinical Notes ◽

Linguistic Annotation

The cTAKES package (using the ClearTK Natural Language Processing toolkit Bethard et al. 2014,http://cleartk.github.io/cleartk/) has been successfully used to automatically read clinical notes in the medical field (Albright et al. 2013, Styler et al. 2014). It is used on a daily basis to automatically process clinical notes and extract relevant information by dozens of medical institutions. ClearEarth is a collaborative project that brings together computational linguistics and domain scientists to port Natural Language Processing (NLP) modules trained on the same types of linguistic annotation to the fields of geology, cryology, and ecology. The goal for ClearEarth in the ecology domain is the extraction of ecologically-relevant terms, including eco-phenotypic traits from text and the assignment of those traits to taxa. Four annotators used Anafora (an annotation software; https://github.com/weitechen/anafora) to mark seven entity types (biotic, aggregate, abiotic, locality, quality, unit, value) and six reciprocal property types (synonym of/has synonym, part of/has part, subtype/supertype) in 133 documents from primarily Encyclopedia of Life (EOL) and Wikipedia according to project guidelines (https://github.com/ClearEarthProject/AnnotationGuidelines). Inter-annotator agreement ranged from 43% to 90%. Performance of ClearEarth on identifying named entities in biology text overall was good (precision: 85.56%; recall: 71.57%). The named entities with the best performance were organisms and their parts/products (biotic entities - precision: 72.09%; recall: 54.17%) and systems and environments (aggregate entities - precision: 79.23%; recall: 75.34%). Terms and their relationships extracted by ClearEarth can be embedded in the new ecocore ontology after vetting (http://www.obofoundry.org/ontology/ecocore.html). This project enables use of advanced industry and research software within natural sciences for downstream operations such as data discovery, assessment, and analysis. In addition, ClearEarth uses the NLP results to generate domain-specific ontologies and other semantic resources.

Download Full-text

Repurposing the Clinical Record: Can an Existing Natural Language Processing System De-identify Clinical Notes?

Journal of the American Medical Informatics Association ◽

10.1197/jamia.m2862 ◽

2009 ◽

Vol 16 (1) ◽

pp. 37-39 ◽

Cited By ~ 20

Author(s):

F. P. Morrison ◽

L. Li ◽

A. M. Lai ◽

G. Hripcsak

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Processing System ◽

Clinical Record ◽

Clinical Notes ◽

Natural Language Processing System

Download Full-text

UTA DLNLP at SemEval-2016 Task 12: Deep Learning Based Natural Language Processing System for Clinical Information Identification from Clinical Notes and Pathology Reports

10.18653/v1/s16-1197 ◽

2016 ◽

Cited By ~ 4

Author(s):

Peng Li ◽

Heng Huang

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Processing System ◽

Clinical Information ◽

Clinical Notes ◽

Natural Language Processing System ◽

Pathology Reports

Download Full-text

PMH52 Use of a Natural Language Processing-Based Approach to Extract Suicide Ideation and Behavior from Clinical Notes to Support Depression Research

Value in Health ◽

10.1016/j.jval.2021.04.674 ◽

2021 ◽

Vol 24 ◽

pp. S137

Author(s):

N. Palmon ◽

S. Momen ◽

M. Leavy ◽

G. Curhan ◽

C. Boussios ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Suicide Ideation ◽

Clinical Notes ◽

And Behavior

Download Full-text

Systematic review of current natural language processing methods and applications in cardiology

Heart ◽

10.1136/heartjnl-2021-319769 ◽

2021 ◽

pp. heartjnl-2021-319769

Author(s):

Meghan Reading Turchioe ◽

Alexander Volodarskiy ◽

Jyotishman Pathak ◽

Drew N Wright ◽

James Enlou Tcheng ◽

...

Keyword(s):

Systematic Review ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Care ◽

Real World Data ◽

Clinical Text ◽

Clinical Notes ◽

Artery Disease ◽

Automated Methods

Natural language processing (NLP) is a set of automated methods to organise and evaluate the information contained in unstructured clinical notes, which are a rich source of real-world data from clinical care that may be used to improve outcomes and understanding of disease in cardiology. The purpose of this systematic review is to provide an understanding of NLP, review how it has been used to date within cardiology and illustrate the opportunities that this approach provides for both research and clinical care. We systematically searched six scholarly databases (ACM Digital Library, Arxiv, Embase, IEEE Explore, PubMed and Scopus) for studies published in 2015–2020 describing the development or application of NLP methods for clinical text focused on cardiac disease. Studies not published in English, lacking a description of NLP methods, non-cardiac focused and duplicates were excluded. Two independent reviewers extracted general study information, clinical details and NLP details and appraised quality using a checklist of quality indicators for NLP studies. We identified 37 studies developing and applying NLP in heart failure, imaging, coronary artery disease, electrophysiology, general cardiology and valvular heart disease. Most studies used NLP to identify patients with a specific diagnosis and extract disease severity using rule-based NLP methods. Some used NLP algorithms to predict clinical outcomes. A major limitation is the inability to aggregate findings across studies due to vastly different NLP methods, evaluation and reporting. This review reveals numerous opportunities for future NLP work in cardiology with more diverse patient samples, cardiac diseases, datasets, methods and applications.

Download Full-text

Development of algorithm for classification smoking status from unstructured bilingual electronic health records based on natural language processing (Preprint)

10.2196/preprints.26978 ◽

2021 ◽

Author(s):

Ye Seul Bae ◽

Kyung Hwan Kim ◽

Han Kyul Kim ◽

Sae Won Choi ◽

Taehoon Ko ◽

...

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Smoking Status ◽

Svm Classifier ◽

Keyword Extraction ◽

Health Records ◽

Clinical Notes ◽

Electronic Health

BACKGROUND Smoking is a major risk factor and important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bilingual electronic health records (EHR). OBJECTIVE We aim to develop an algorithm to classify smoking status based on unstructured EHRs using natural language processing (NLP). METHODS With acronym replacement and Python package Soynlp, we normalize 4,711 bilingual clinical notes. Each EHR notes was classified into 4 categories: current smokers, past smokers, never smokers, and unknown. Subsequently, SPPMI (Shifted Positive Point Mutual Information) is used to vectorize words in the notes. By calculating cosine similarity between these word vectors, keywords denoting the same smoking status are identified. RESULTS Compared to other keyword extraction methods (word co-occurrence-, PMI-, and NPMI-based methods), our proposed approach improves keyword extraction precision by as much as 20.0%. These extracted keywords are used in classifying 4 smoking statuses from our bilingual clinical notes. Given an identical SVM classifier, the extracted keywords improve the F1 score by as much as 1.8% compared to those of the unigram and bigram Bag of Words. CONCLUSIONS Our study shows the potential of SPPMI in classifying smoking status from bilingual, unstructured EHRs. Our current findings show how smoking information can be easily acquired and used for clinical practice and research.

Download Full-text

The Experience of Developing a Large-Scale Natural Language Processing System: Critique

The Kluwer International Series in Engineering and Computer Science - Natural Language Processing: The PLNLP Approach ◽

10.1007/978-1-4615-3170-8_7 ◽

1993 ◽

pp. 77-89 ◽

Cited By ~ 2

Author(s):

Stephen Richardson ◽

Lisa Braden-Harder

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Processing System ◽

Natural Language Processing System

Download Full-text

Identificação de Pragas e Doenças na Cultura da Soja por meio de um Sistema Computacional em Linguagem Natural

10.14210/cotb.v12.p324-331 ◽

2021 ◽

Author(s):

Carolinne Roque e Faria ◽

Cinthyan Renata Sachs Camerlengo de Barb

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Computer System ◽

Language Processing ◽

Agricultural Area ◽

Syntactic Analysis ◽

Dependency Parsing ◽

Named Entities ◽

Pests And Diseases ◽

Improve Production

Technology is becoming expressively popular among agribusiness producers and is progressing in all agricultural area. One of the difficulties in this context is to handle data in natural language to solve problems in the field of agriculture. In order to build up dialogs and provide rich researchers, the present work uses Natural Language Processing (NLP) techniques to develop an automatic and effective computer system to interact with the user and assist in the identification of pests and diseases in the soybean farming, stored in a database repository to provide accurate diagnoses to simplify the work of the agricultural professional and also for those who deal with a lot of information in this area. Information on 108 pests and 19 diseases that damage Brazilian soybean was collected from Brazilian bibliographic manuals with the purpose to optimize the data and improve production, using the spaCy library for syntactic analysis of NLP, which allowed the pre-process the texts, recognize the named entities, calculate the similarity between the words, verify dependency parsing and also provided the support for the development requirements of the CAROLINA tool (Robotized Agronomic Conversation in Natural Language) using the language belonging to the agricultural area.

Download Full-text

A Natural Language Processing System for Extracting Evidence of Drug Repurposing from Scientific Publications

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i08.7052 ◽

2020 ◽

Vol 34 (08) ◽

pp. 13369-13381

Author(s):

Shivashankar Subramanian ◽

Ioana Baldini ◽

Sushma Ravichandran ◽

Dmitriy A. Katz-Rogozhnikov ◽

Karthikeyan Natesan Ramamurthy ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Generic Drugs ◽

Low Cost ◽

Processing System ◽

Drug Repurposing ◽

Cancer Type ◽

Entity Extraction ◽

Scientific Publications

More than 200 generic drugs approved by the U.S. Food and Drug Administration for non-cancer indications have shown promise for treating cancer. Due to their long history of safe patient use, low cost, and widespread availability, repurposing of these drugs represents a major opportunity to rapidly improve outcomes for cancer patients and reduce healthcare costs. In many cases, there is already evidence of efficacy for cancer, but trying to manually extract such evidence from the scientific literature is intractable. In this emerging applications paper, we introduce a system to automate non-cancer generic drug evidence extraction from PubMed abstracts. Our primary contribution is to define the natural language processing pipeline required to obtain such evidence, comprising the following modules: querying, filtering, cancer type entity extraction, therapeutic association classification, and study type classification. Using the subject matter expertise on our team, we create our own datasets for these specialized domain-specific tasks. We obtain promising performance in each of the modules by utilizing modern language processing techniques and plan to treat them as baseline approaches for future improvement of individual components.

Download Full-text

Face Detection and Natural Language Processing System Using Artificial Intelligence

Lecture Notes in Networks and Systems - Inventive Communication and Computational Technologies ◽

10.1007/978-981-15-0146-3_73 ◽

2020 ◽

pp. 773-780

Author(s):

H. S. Avani ◽

Ayushi Turkar ◽

C. D. Divya

Keyword(s):

Artificial Intelligence ◽

Natural Language Processing ◽

Natural Language ◽

Face Detection ◽

Language Processing ◽

Processing System ◽

Natural Language Processing System

Download Full-text

Natural Language Processing: System Evaluation

Encyclopedia of Language & Linguistics ◽

10.1016/b0-08-044854-2/00932-9 ◽

2006 ◽

pp. 518-523

Author(s):

M. Maybury

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Processing System ◽

System Evaluation ◽

Natural Language Processing System

Download Full-text