The impact of learning Unified Medical Language System knowledge embeddings in relation extraction from biomedical texts

Maxwell A Weinzierl; Ramon Maldonado; Sanda M Harabagiu

doi:10.1093/jamia/ocaa205

The impact of learning Unified Medical Language System knowledge embeddings in relation extraction from biomedical texts

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa205 ◽

2020 ◽

Vol 27 (10) ◽

pp. 1556-1567

Author(s):

Maxwell A Weinzierl ◽

Ramon Maldonado ◽

Sanda M Harabagiu

Keyword(s):

State Of The Art ◽

Relation Extraction ◽

Language System ◽

Unified Medical Language System ◽

Medical Language ◽

Interaction Extraction ◽

Biomedical Texts ◽

The Impact ◽

System Operating

Abstract Objective We explored how knowledge embeddings (KEs) learned from the Unified Medical Language System (UMLS) Metathesaurus impact the quality of relation extraction on 2 diverse sets of biomedical texts. Materials and Methods Two forms of KEs were learned for concepts and relation types from the UMLS Metathesaurus, namely lexicalized knowledge embeddings (LKEs) and unlexicalized KEs. A knowledge embedding encoder (KEE) enabled learning either LKEs or unlexicalized KEs as well as neural models capable of producing LKEs for mentions of biomedical concepts in texts and relation types that are not encoded in the UMLS Metathesaurus. This allowed us to design the relation extraction with knowledge embeddings (REKE) system, which incorporates either LKEs or unlexicalized KEs produced for relation types of interest and their arguments. Results The incorporation of either LKEs or unlexicalized KE in REKE advances the state of the art in relation extraction on 2 relation extraction datasets: the 2010 i2b2/VA dataset and the 2013 Drug-Drug Interaction Extraction Challenge corpus. Moreover, the impact of LKEs is superior, achieving F1 scores of 78.2 and 82.0, respectively. Discussion REKE not only highlights the importance of incorporating knowledge encoded in the UMLS Metathesaurus in a novel way, through 2 possible forms of KEs, but it also showcases the subtleties of incorporating KEs in relation extraction systems. Conclusions Incorporating LKEs informed by the UMLS Metathesaurus in a relation extraction system operating on biomedical texts shows significant promise. We present the REKE system, which establishes new state-of-the-art results for relation extraction on 2 datasets when using LKEs.

Download Full-text

BioRel: towards large-scale biomedical relation extraction

BMC Bioinformatics ◽

10.1186/s12859-020-03889-5 ◽

2020 ◽

Vol 21 (S16) ◽

Author(s):

Rui Xing ◽

Jie Luo ◽

Tengwei Song

Keyword(s):

Deep Learning ◽

Large Scale ◽

Critical Role ◽

Relation Extraction ◽

Extraction Methods ◽

Statistical Machine Learning ◽

Language System ◽

Unified Medical Language System ◽

Medical Language ◽

Biomedical Relation Extraction

Abstract Background Although biomedical publications and literature are growing rapidly, there still lacks structured knowledge that can be easily processed by computer programs. In order to extract such knowledge from plain text and transform them into structural form, the relation extraction problem becomes an important issue. Datasets play a critical role in the development of relation extraction methods. However, existing relation extraction datasets in biomedical domain are mainly human-annotated, whose scales are usually limited due to their labor-intensive and time-consuming nature. Results We construct BioRel, a large-scale dataset for biomedical relation extraction problem, by using Unified Medical Language System as knowledge base and Medline as corpus. We first identify mentions of entities in sentences of Medline and link them to Unified Medical Language System with Metamap. Then, we assign each sentence a relation label by using distant supervision. Finally, we adapt the state-of-the-art deep learning and statistical machine learning methods as baseline models and conduct comprehensive experiments on the BioRel dataset. Conclusions Based on the extensive experimental results, we have shown that BioRel is a suitable large-scale datasets for biomedical relation extraction, which provides both reasonable baseline performance and many remaining challenges for both deep learning and statistical methods.

Download Full-text

Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)–based ranking for concept normalization

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa080 ◽

2020 ◽

Vol 27 (10) ◽

pp. 1510-1519

Author(s):

Dongfang Xu ◽

Manoj Gopale ◽

Jiacheng Zhang ◽

Kris Brown ◽

Edmon Begoli ◽

...

Keyword(s):

Neural Network ◽

Relation Extraction ◽

Training Data ◽

Shared Task ◽

Semantic Type ◽

Language System ◽

Unified Medical Language System ◽

Medical Language ◽

Rank System ◽

Semantic Types

Abstract Objective Concept normalization, the task of linking phrases in text to concepts in an ontology, is useful for many downstream tasks including relation extraction, information retrieval, etc. We present a generate-and-rank concept normalization system based on our participation in the 2019 National NLP Clinical Challenges Shared Task Track 3 Concept Normalization. Materials and Methods The shared task provided 13 609 concept mentions drawn from 100 discharge summaries. We first design a sieve-based system that uses Lucene indices over the training data, Unified Medical Language System (UMLS) preferred terms, and UMLS synonyms to generate a list of possible concepts for each mention. We then design a listwise classifier based on the BERT (Bidirectional Encoder Representations from Transformers) neural network to rank the candidate concepts, integrating UMLS semantic types through a regularizer. Results Our generate-and-rank system was third of 33 in the competition, outperforming the candidate generator alone (81.66% vs 79.44%) and the previous state of the art (76.35%). During postevaluation, the model’s accuracy was increased to 83.56% via improvements to how training data are generated from UMLS and incorporation of our UMLS semantic type regularizer. Discussion Analysis of the model shows that prioritizing UMLS preferred terms yields better performance, that the UMLS semantic type regularizer results in qualitatively better concept predictions, and that the model performs well even on concepts not seen during training. Conclusions Our generate-and-rank framework for UMLS concept normalization integrates key UMLS features like preferred terms and semantic types with a neural network–based ranking model to accurately link phrases in text to UMLS concepts.

Download Full-text

Unified Medical Language System

10.32388/urur42 ◽

2020 ◽

Cited By ~ 2

Author(s):

Keyword(s):

Language System ◽

Unified Medical Language System ◽

Medical Language

Download Full-text

Smoothing additive manufactured parts using ns-pulsed laser radiation

Progress in Additive Manufacturing ◽

10.1007/s40964-021-00168-4 ◽

2021 ◽

Author(s):

Florian Kuisat ◽

Fernando Lasagni ◽

Andrés Fabián Lasagni

Keyword(s):

Surface Roughness ◽

Mechanical Performance ◽

State Of The Art ◽

Pulsed Laser ◽

Industrial Applications ◽

Laser Source ◽

Pulsed Laser Radiation ◽

Current State ◽

The Impact

AbstractIt is well known that the surface topography of a part can affect its mechanical performance, which is typical in additive manufacturing. In this context, we report about the surface modification of additive manufactured components made of Titanium 64 (Ti64) and Scalmalloy®, using a pulsed laser, with the aim of reducing their surface roughness. In our experiments, a nanosecond-pulsed infrared laser source with variable pulse durations between 8 and 200 ns was applied. The impact of varying a large number of parameters on the surface quality of the smoothed areas was investigated. The results demonstrated a reduction of surface roughness Sa by more than 80% for Titanium 64 and by 65% for Scalmalloy® samples. This allows to extend the applicability of additive manufactured components beyond the current state of the art and break new ground for the application in various industrial applications such as in aerospace.

Download Full-text

Using Semantic and Structural Properties of the Unified Medical Language System to Discover Potential Terminological Relationships

Journal of the American Medical Informatics Association ◽

10.1197/jamia.m2931 ◽

2009 ◽

Vol 16 (3) ◽

pp. 346-353 ◽

Cited By ~ 10

Author(s):

C. O. Patel ◽

J. J. Cimino

Keyword(s):

Structural Properties ◽

Language System ◽

Unified Medical Language System ◽

Medical Language

Download Full-text

Auditing the Unified Medical Language System with Semantic Methods

Journal of the American Medical Informatics Association ◽

10.1136/jamia.1998.0050041 ◽

1998 ◽

Vol 5 (1) ◽

pp. 41-51 ◽

Cited By ~ 48

Author(s):

J. J. Cimino

Keyword(s):

Language System ◽

Unified Medical Language System ◽

Medical Language

Download Full-text

Multi-Graph Cooperative Learning Towards Distant Supervised Relation Extraction

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3466560 ◽

2021 ◽

Vol 12 (5) ◽

pp. 1-21

Author(s):

Changsen Yuan ◽

Heyan Huang ◽

Chong Feng

Keyword(s):

Cooperative Learning ◽

State Of The Art ◽

Relation Extraction ◽

Sentence Length ◽

Universal Relation ◽

Dependency Parsing ◽

Convolutional Network ◽

Syntactic Features ◽

Use Dependency

The Graph Convolutional Network (GCN) is a universal relation extraction method that can predict relations of entity pairs by capturing sentences’ syntactic features. However, existing GCN methods often use dependency parsing to generate graph matrices and learn syntactic features. The quality of the dependency parsing will directly affect the accuracy of the graph matrix and change the whole GCN’s performance. Because of the influence of noisy words and sentence length in the distant supervised dataset, using dependency parsing on sentences causes errors and leads to unreliable information. Therefore, it is difficult to obtain credible graph matrices and relational features for some special sentences. In this article, we present a Multi-Graph Cooperative Learning model (MGCL), which focuses on extracting the reliable syntactic features of relations by different graphs and harnessing them to improve the representations of sentences. We conduct experiments on a widely used real-world dataset, and the experimental results show that our model achieves the state-of-the-art performance of relation extraction.

Download Full-text

The outline of Unified Medical Language System(UMLS) Knowledge Sources.

Journal of Information Processing and Management ◽

10.1241/johokanri.41.15 ◽

1998 ◽

Vol 41 (1) ◽

pp. 15-23

Author(s):

Koreni KAWANO

Keyword(s):

Knowledge Sources ◽

Language System ◽

Unified Medical Language System ◽

Medical Language

Download Full-text

Unified Medical Language System

Electronic Health Record ◽

10.1002/9781118479612.ch16 ◽

2012 ◽

pp. 145-152 ◽

Cited By ~ 1

Keyword(s):

Language System ◽

Unified Medical Language System ◽

Medical Language

Download Full-text

IAIMS and UMLS at Columbia-Presbyterian Medical Center

Medical Decision Making ◽

10.1177/0272989x9101104s17 ◽

1991 ◽

Vol 11 (4_suppl) ◽

pp. S89-S93 ◽

Cited By ~ 4

Author(s):

James J. Cimino ◽

Soumitra Sengupta

Keyword(s):

Information Management ◽

Management System ◽

Medical Center ◽

Information Management System ◽

Language System ◽

Unified Medical Language System ◽

Medical Language ◽

Academic Information

The authors use an example to illustrate combining Integrated Academic Information Management System (IAIMS) components (applications) into an integral whole, to facilitate using the components simultaneously or in sequence. They examine a model for classifying IAIMS systems, proposing ways in which the Unified Medical Language System (UMLS) can be exploited in them.

Download Full-text