Towards a Semantic Lexicon for Biological Language Processing

Karin Verspoor

doi:10.1002/cfg.451

Towards a Semantic Lexicon for Biological Language Processing

Comparative and Functional Genomics ◽

10.1002/cfg.451 ◽

2005 ◽

Vol 6 (1-2) ◽

pp. 61-66 ◽

Cited By ~ 6

Author(s):

Karin Verspoor

Keyword(s):

Molecular Biology ◽

Language Processing ◽

Semantic Information ◽

Text Processing ◽

National Library ◽

Important Resource ◽

Language System ◽

Unified Medical Language System ◽

Medical Language

This paper explores the use of the resources in the National Library of Medicine's Unified Medical Language System (UMLS) for the construction of a lexicon useful for processing texts in the field of molecular biology. A lexicon is constructed from overlapping terms in the UMLS SPECIALIST lexicon and the UMLS Metathesaurus to obtain both morphosyntactic and semantic information for terms, and the coverage of a domain corpus is assessed. Over 77% of tokens in the domain corpus are found in the constructed lexicon, validating the lexicon's coverage of the most frequent terms in the domain and indicating that the constructed lexicon is potentially an important resource for biological text processing.

Download Full-text

The Unified Medical Language System SPECIALIST Lexicon and Lexical Tools: Development and applications

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa056 ◽

2020 ◽

Vol 27 (10) ◽

pp. 1600-1605 ◽

Cited By ~ 2

Author(s):

Chris J Lu ◽

Amanda Payne ◽

James G Mork

Keyword(s):

Concept Mapping ◽

Language Processing ◽

Vital Role ◽

Unstructured Data ◽

National Library ◽

Language System ◽

Unified Medical Language System ◽

The Core ◽

Medical Language ◽

Recent Developments

Abstract Natural language processing (NLP) plays a vital role in modern medical informatics. It converts narrative text or unstructured data into knowledge by analyzing and extracting concepts. A comprehensive lexical system is the foundation to the success of NLP applications and an essential component at the beginning of the NLP pipeline. The SPECIALIST Lexicon and Lexical Tools, distributed by the National Library of Medicine as one of the Unified Medical Language System Knowledge Sources, provides an underlying resource for many NLP applications. This article reports recent developments of 3 key components in the Lexicon. The core NLP operation of Unified Medical Language System concept mapping is used to illustrate the importance of these developments. Our objective is to provide generic, broad coverage and a robust lexical system for NLP applications. A novel multiword approach and other planned developments are proposed.

Download Full-text

Clinical trial cohort selection based on multi-level rule-based natural language processing system

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocz109 ◽

2019 ◽

Vol 26 (11) ◽

pp. 1218-1226 ◽

Cited By ~ 7

Author(s):

Long Chen ◽

Yu Gu ◽

Xin Ji ◽

Chao Lou ◽

Zhiyong Sun ◽

...

Keyword(s):

Clinical Trials ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Rule Based ◽

Language System ◽

Unified Medical Language System ◽

Rule Based System ◽

Medical Language ◽

Cohort Selection

Abstract Objective Identifying patients who meet selection criteria for clinical trials is typically challenging and time-consuming. In this article, we describe our clinical natural language processing (NLP) system to automatically assess patients’ eligibility based on their longitudinal medical records. This work was part of the 2018 National NLP Clinical Challenges (n2c2) Shared-Task and Workshop on Cohort Selection for Clinical Trials. Materials and Methods The authors developed an integrated rule-based clinical NLP system which employs a generic rule-based framework plugged in with lexical-, syntactic- and meta-level, task-specific knowledge inputs. In addition, the authors also implemented and evaluated a general clinical NLP (cNLP) system which is built with the Unified Medical Language System and Unstructured Information Management Architecture. Results and Discussion The systems were evaluated as part of the 2018 n2c2-1 challenge, and authors’ rule-based system obtained an F-measure of 0.9028, ranking fourth at the challenge and had less than 1% difference from the best system. While the general cNLP system didn’t achieve performance as good as the rule-based system, it did establish its own advantages and potential in extracting clinical concepts. Conclusion Our results indicate that a well-designed rule-based clinical NLP system is capable of achieving good performance on cohort selection even with a small training data set. In addition, the investigation of a Unified Medical Language System-based general cNLP system suggests that a hybrid system combining these 2 approaches is promising to surpass the state-of-the-art performance.

Download Full-text

Interoperability and Mapping Between Knowledge Organization Systems: Metathesaurus— Unified Medical Language System of the National Library of Medicine

KNOWLEDGE ORGANIZATION ◽

10.5771/0943-7444-2016-2-107 ◽

2016 ◽

Vol 43 (2) ◽

pp. 107-112 ◽

Cited By ~ 1

Author(s):

Julietti de Andrade ◽

Marilda Lopes Ginez de Lara

Keyword(s):

Knowledge Organization ◽

National Library ◽

Language System ◽

Unified Medical Language System ◽

Medical Language ◽

Knowledge Organization Systems

Download Full-text

Navigating to Knowledge

Methods of Information in Medicine ◽

10.1055/s-0038-1634582 ◽

1995 ◽

Vol 34 (01/02) ◽

pp. 214-231 ◽

Cited By ~ 5

Author(s):

M. S. Tuttle ◽

W. G. Cole ◽

D. D. Sherertz ◽

S. J. Nelson

Keyword(s):

National Cancer Institute ◽

Point Of Care ◽

Visual Representation ◽

Medical Knowledge ◽

National Library ◽

Language System ◽

Unified Medical Language System ◽

Medical Language ◽

Computer Based ◽

The U.S

Abstract:One way to fulfill point-of-care knowledge needs is to present caregivers with a visual representation of the available “answers”. Using such a representation, caregivers can recognize what they want, rather than have to recall what they need, and then navigate to an appropriate answer. Given selected pieces of information from a computer-based patient record, an interface can anticipate certain knowledge needs by initializing caregiver navigation in a semantic neighborhood of answers likely to be relevant to the patient at hand. These notions draw heavily on two collaborative projects – the U.S. National Library of Medicine Unified Medical Language System® and the U.S. National Cancer Institute Knowledge Server. Both of these projects support navigation because they make the structure of medical knowledge explicit in a way that can be exploited by human interfaces.

Download Full-text

Mapping the Gene Ontology Into the Unified Medical Language System

Comparative and Functional Genomics ◽

10.1002/cfg.407 ◽

2004 ◽

Vol 5 (4) ◽

pp. 354-361 ◽

Cited By ~ 22

Author(s):

Jane Lomax ◽

Alexa T. McCray

Keyword(s):

Gene Ontology ◽

Clinical Medicine ◽

Gene Products ◽

National Library ◽

Web Based ◽

Language System ◽

Unified Medical Language System ◽

Medical Language ◽

Large Numbers ◽

Go Terms

We have recently mapped the Gene Ontology (GO), developed by the Gene Ontology Consortium, into the National Library of Medicine's Unified Medical Language System (UMLS). GO has been developed for the purpose of annotating gene products in genome databases, and the UMLS has been developed as a framework for integrating large numbers of disparate terminologies, primarily for the purpose of providing better access to biomedical information sources. The mapping of GO to UMLS highlighted issues in both terminology systems. After some initial explorations and discussions between the UMLS and GO teams, the GO was integrated with the UMLS. Overall, a total of 23% of the GO terms either matched directly (3%) or linked (20%) to existing UMLS concepts. All GO terms now have a corresponding, official UMLS concept, and the entire vocabulary is available through the web-based UMLS Knowledge Source Server. The mapping of the Gene Ontology, with its focus on structures, processes and functions at the molecular level, to the existing broad coverage UMLS should contribute to linking the language and practices of clinical medicine to the language and practices of genomics.

Download Full-text