A Hybrid Model for Entity Identification and Relation Extraction of Family History Information (Preprint)

Mapping Intimacies ◽

10.2196/preprints.22797 ◽

2020 ◽

Author(s):

Youngjun Kim ◽

Paul M Heider ◽

Isabel R H Lally ◽

Stéphane M Meystre

Keyword(s):

Family History ◽

Information Extraction ◽

Relation Extraction ◽

Free Text ◽

Data Sets ◽

Family History Information ◽

Clinical Notes ◽

History Information ◽

End To End ◽

Entity Identification

BACKGROUND Family history information is important to assess the risk of inherited medical conditions. Natural language processing has the potential to extract this information from unstructured free-text notes to improve patient care and decision-making. We describe the end-to-end information extraction system the Medical University of South Carolina team developed when participating in the 2019 n2c2/OHNLP shared task. OBJECTIVE This task involves identifying mentions of family members and observations in electronic health record text notes, and recognizing the relations between family members, observations, and living status. Our system aims to achieve a high level of performance by integrating heuristics and advanced information extraction methods. Our efforts also include improving the performance of two subtasks by exploiting additional labeled data and clinical text-based embedding models. METHODS We present a hybrid method that combines machine learning and rule-based approaches. We implemented an end-to-end system with multiple information extraction and attribute classification components. For entity identification, we trained bidirectional long short-term memory deep learning models. These models incorporated static word embeddings and context-dependent embeddings. We created a voting ensemble that combined the predictions of all individual models. For relation extraction, we trained two relation extraction models. The first model determined the living status of each family member. The second model identified observations associated with each family member. We implemented online gradient descent models to extract related entity pairs. As part of post-challenge efforts, we used the BioCreative/OHNLP 2018 corpus and trained new models with the union of these two data sets. We also pre-trained language models using clinical notes from the MIMIC-III clinical database. RESULTS The voting ensemble achieved better performance than individual classifiers. In the entity identification task, the best performing system reached a precision of 78.90% and a recall of 83.84%. Our NLP system for entity identification and relation extraction ranked 3rd and 4th respectively in the challenge. Our end-to-end pipeline system substantially benefited from the combination of the two data sets. Compared to our official submission, the revised system yielded significantly better performance (p < 0.05) with F1-scores of 86.02% and 72.48% for entity identification and relation extraction, respectively. CONCLUSIONS We demonstrated that a hybrid model could be used to successfully extract family history information recorded in unstructured free-text notes. In this study, our approach of entity identification as a sequence labeling problem produced satisfactory results. Our post-challenge efforts significantly improved performance by leveraging additional labeled data and using word vector representations learned from large collections of clinical notes.

Download Full-text

Family history information extraction via deep joint learning

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-019-0995-5 ◽

2019 ◽

Vol 19 (S10) ◽

Cited By ~ 7

Author(s):

Xue Shi ◽

Dehuan Jiang ◽

Yuanhang Huang ◽

Xiaolong Wang ◽

Qingcai Chen ◽

...

Keyword(s):

Family History ◽

Language Processing ◽

Family Members ◽

Decision Making Process ◽

Family History Information ◽

Health Records ◽

Joint Learning ◽

Clinical Text ◽

History Information ◽

Entity Identification

Abstract Background Family history (FH) information, including family members, side of family of family members (i.e., maternal or paternal), living status of family members, observations (diseases) of family members, etc., is very important in the decision-making process of disorder diagnosis and treatment. However FH information cannot be used directly by computers as it is always embedded in unstructured text in electronic health records (EHRs). In order to extract FH information form clinical text, there is a need of natural language processing (NLP). In the BioCreative/OHNLP2018 challenge, there is a task regarding FH extraction (i.e., task1), including two subtasks: (1) entity identification, identifying family members and their observations (diseases) mentioned in clinical text; (2) family history extraction, extracting side of family of family members, living status of family members, and observations of family members. For this task, we propose a system based on deep joint learning methods to extract FH information. Our system achieves the highest F1- scores of 0.8901 on subtask1 and 0.6359 on subtask2, respectively.

Download Full-text

Representation of Information about Family Relatives as Structured Data in Electronic Health Records

Applied Clinical Informatics ◽

10.4338/aci-2013-10-ra-0080 ◽

2014 ◽

Vol 05 (02) ◽

pp. 349-367 ◽

Cited By ~ 13

Author(s):

Y. Lu ◽

C.J. Vitale ◽

P.L. Mar ◽

F. Chang ◽

N. Dhopeshwarkar ◽

...

Keyword(s):

Family History ◽

Structured Data ◽

Free Text ◽

Snomed Ct ◽

Family History Information ◽

Text Documents ◽

Health Records ◽

Relative Information ◽

History Information ◽

Electronic Health

SummaryBackground: The ability to manage and leverage family history information in the electronic health record (EHR) is crucial to delivering high-quality clinical care.Objectives: We aimed to evaluate existing standards in representing relative information, examine this information documented in EHRs, and develop a natural language processing (NLP) application to extract relative information from free-text clinical documents.Methods: We reviewed a random sample of 100 admission notes and 100 discharge summaries of 198 patients, and also reviewed the structured entries for these patients in an EHR system’s family history module. We investigated the two standards used by Stage 2 of Meaningful Use (SNOMED CT and HL7 Family History Standard) and identified coverage gaps of each standard in coding relative information. Finally, we evaluated the performance of the MTERMS NLP system in identifying relative information from free-text documents.Results: The structure and content of SNOMED CT and HL7 for representing relative information are different in several ways. Both terminologies have high coverage to represent local relative concepts built in an ambulatory EHR system, but gaps in key concept coverage were detected; coverage rates for relative information in free-text clinical documents were 95.2% and 98.6%, respectively. Compared to structured entries, richer family history information was only available in free-text documents. Using a comprehensive lexicon that included concepts and terms of relative information from different sources, we expanded the MTERMS NLP system to extract and encode relative information in clinical documents and achieved a corresponding precision of 100% and recall of 97.4%.Conclusions: Comprehensive assessment and user guidance are critical to adopting standards into EHR systems in a meaningful way. A significant portion of patients’ family history information is only documented in free-text clinical documents and NLP can be used to extract this information.Citation: Zhou L, Lu Y, Vitale CJ, Mar PL, Chang F, Dhopeshwarkar N, Rocha RA. Representation of information about family relatives as structured data in electronic health records. Appl Clin Inf 2014; 5: 349–367 http://dx.doi.org/10.4338/ACI-2013-10-RA-0080

Download Full-text

Rule-based extraction of family history information from clinical notes

Proceedings of the 35th Annual ACM Symposium on Applied Computing ◽

10.1145/3341105.3374000 ◽

2020 ◽

Cited By ~ 2

Author(s):

João Rafael Almeida ◽

Sérgio Matos

Keyword(s):

Family History ◽

Family History Information ◽

Rule Based ◽

Clinical Notes ◽

History Information

Download Full-text

A Hybrid Model for Family History Information Identification and Relation Extraction (Preprint)

JMIR Medical Informatics ◽

10.2196/22797 ◽

2020 ◽

Author(s):

Youngjun Kim ◽

Paul M Heider ◽

Isabel R H Lally ◽

Stéphane M Meystre

Keyword(s):

Family History ◽

Hybrid Model ◽

Relation Extraction ◽

Family History Information ◽

History Information

Download Full-text

Family History Extraction Using Deep Biaffine Attention (Preprint)

10.2196/preprints.23587 ◽

2020 ◽

Cited By ~ 1

Author(s):

Kecheng Zhan ◽

Weihua Peng ◽

Ying Xiong ◽

Huhao Fu ◽

Qingcai Chen ◽

...

Keyword(s):

Family History ◽

Information Extraction ◽

Short Term Memory ◽

Family Members ◽

Relation Extraction ◽

Disease Diagnosis ◽

Entity Recognition ◽

Shared Task ◽

Family History Information ◽

Clinical Text

BACKGROUND Family history (FH) information, including family members, side of family of family members, living status of family members, observations of family members, etc., plays a significant role in disease diagnosis and treatment. Family member information extraction aims to extract FH information from semi-structured/unstructured text in electronic health records (EHRs), which is a challenging task regarding named entity recognition (NER) and relation extraction (RE), where NE refers to family members, living status and observations, and relation refers to relations between family members and living status, and relations between family members and observations. OBJECTIVE This study aims to explore the ways to effectively extract family history information from clinical text. METHODS Inspired by dependency parsing, we design a novel graph-based schema to represent FH information and introduced deep biaffine attention to extract FH information in clinical text. In the deep biaffine attention model, we use CNN-BiLSTM (Convolutional Neural Network-Bidirectional Long Short Term Memory network) and BERT (Bidirectional Encoder Representation from Transformers) to encode input sentences, and deployed biaffine classifier to extract FH information. In addition, we also develop a post-processing module to adjust results. A system based on the proposed method was developed for the 2019 n2c2/OHNLP shared task track on FH information extraction, which includes two subtasks on entity recognition and relation extraction respectively. RESULTS We conduct experiments on the corpus provided by the 2019 n2c2/OHNLP shared task track on FH information extraction. Our system achieved the highest F1-scores of 0.8823 on subtask 1 and 0.7048 on subtask 2, respectively, new benchmark results on the 2019 n2c2/OHNLP corpus. CONCLUSIONS This study designed a novel Schema to represent FH information using graph and applied deep biaffine attention to extract FH information. Experimental results show the effectiveness of deep biaffine attention on FH information extraction.

Download Full-text

Extraction of Family History Information From Clinical Notes: Deep Learning and Heuristics Approach

JMIR Medical Informatics ◽

10.2196/22898 ◽

2020 ◽

Vol 8 (12) ◽

pp. e22898

Author(s):

João Figueira Silva ◽

João Rafael Almeida ◽

Sérgio Matos

Keyword(s):

Deep Learning ◽

Family History ◽

Family Member ◽

Hybrid System ◽

Family History Information ◽

Rule Based ◽

Clinical Notes ◽

History Information ◽

Test Sets ◽

The Impact

Background Electronic health records store large amounts of patient clinical data. Despite efforts to structure patient data, clinical notes containing rich patient information remain stored as free text, greatly limiting its exploitation. This includes family history, which is highly relevant for applications such as diagnosis and prognosis. Objective This study aims to develop automatic strategies for annotating family history information in clinical notes, focusing not only on the extraction of relevant entities such as family members and disease mentions but also on the extraction of relations between the identified entities. Methods This study extends a previous contribution for the 2019 track on family history extraction from national natural language processing clinical challenges by improving a previously developed rule-based engine, using deep learning (DL) approaches for the extraction of entities from clinical notes, and combining both approaches in a hybrid end-to-end system capable of successfully extracting family member and observation entities and the relations between those entities. Furthermore, this study analyzes the impact of factors such as the use of external resources and different types of embeddings in the performance of DL models. Results The approaches developed were evaluated in a first task regarding entity extraction and in a second task concerning relation extraction. The proposed DL approach improved observation extraction, obtaining F1 scores of 0.8688 and 0.7907 in the training and test sets, respectively. However, DL approaches have limitations in the extraction of family members. The rule-based engine was adjusted to have higher generalizing capability and achieved family member extraction F1 scores of 0.8823 and 0.8092 in the training and test sets, respectively. The resulting hybrid system obtained F1 scores of 0.8743 and 0.7979 in the training and test sets, respectively. For the second task, the original evaluator was adjusted to perform a more exact evaluation than the original one, and the hybrid system obtained F1 scores of 0.6480 and 0.5082 in the training and test sets, respectively. Conclusions We evaluated the impact of several factors on the performance of DL models, and we present an end-to-end system for extracting family history information from clinical notes, which can help in the structuring and reuse of this type of information. The final hybrid solution is provided in a publicly available code repository.

Download Full-text

Extraction of Family History Information From Clinical Notes: Deep Learning and Heuristics Approach (Preprint)

10.2196/preprints.22898 ◽

2020 ◽

Author(s):

João Figueira Silva ◽

João Rafael Almeida ◽

Sérgio Matos

Keyword(s):

Deep Learning ◽

Family History ◽

Family Member ◽

Hybrid System ◽

Family History Information ◽

Rule Based ◽

Clinical Notes ◽

History Information ◽

Test Sets ◽

The Impact

BACKGROUND Electronic health records store large amounts of patient clinical data. Despite efforts to structure patient data, clinical notes containing rich patient information remain stored as free text, greatly limiting its exploitation. This includes family history, which is highly relevant for applications such as diagnosis and prognosis. OBJECTIVE This study aims to develop automatic strategies for annotating family history information in clinical notes, focusing not only on the extraction of relevant entities such as family members and disease mentions but also on the extraction of relations between the identified entities. METHODS This study extends a previous contribution for the 2019 track on family history extraction from national natural language processing clinical challenges by improving a previously developed rule-based engine, using deep learning (DL) approaches for the extraction of entities from clinical notes, and combining both approaches in a hybrid end-to-end system capable of successfully extracting family member and observation entities and the relations between those entities. Furthermore, this study analyzes the impact of factors such as the use of external resources and different types of embeddings in the performance of DL models. RESULTS The approaches developed were evaluated in a first task regarding entity extraction and in a second task concerning relation extraction. The proposed DL approach improved observation extraction, obtaining F1 scores of 0.8688 and 0.7907 in the training and test sets, respectively. However, DL approaches have limitations in the extraction of family members. The rule-based engine was adjusted to have higher generalizing capability and achieved family member extraction F1 scores of 0.8823 and 0.8092 in the training and test sets, respectively. The resulting hybrid system obtained F1 scores of 0.8743 and 0.7979 in the training and test sets, respectively. For the second task, the original evaluator was adjusted to perform a more exact evaluation than the original one, and the hybrid system obtained F1 scores of 0.6480 and 0.5082 in the training and test sets, respectively. CONCLUSIONS We evaluated the impact of several factors on the performance of DL models, and we present an end-to-end system for extracting family history information from clinical notes, which can help in the structuring and reuse of this type of information. The final hybrid solution is provided in a publicly available code repository.

Download Full-text

Family history information in biomedical research

Journal of Continuing Education in the Health Professions ◽

10.1002/chp.1340210405 ◽

2001 ◽

Vol 21 (4) ◽

pp. 215-223 ◽

Cited By ~ 6

Author(s):

Kenneth S. Kendler

Keyword(s):

Family History ◽

Biomedical Research ◽

Family History Information ◽

History Information

Download Full-text

Use of Family History Information for Neural Tube Defect Prevention

American Journal of Health Education ◽

10.1080/19325037.2011.10599200 ◽

2011 ◽

Vol 42 (5) ◽

pp. 296-308

Author(s):

Ridgely Fisk Green ◽

Joan Ehrhardt ◽

Margaret F. Ruttenber ◽

Richard S. Olney

Keyword(s):

Family History ◽

Neural Tube ◽

Neural Tube Defect ◽

Family History Information ◽

History Information ◽

Defect Prevention

Download Full-text

Assessment of Family History Information in Case-Control Cancer Studies

American Journal of Epidemiology ◽

10.1093/oxfordjournals.aje.a115954 ◽

1991 ◽

Vol 133 (8) ◽

pp. 757-765 ◽

Cited By ~ 15

Author(s):

Pamela H. Phillips ◽

Martha S. Linet ◽

Emily L. Harris

Keyword(s):

Family History ◽

Case Control ◽

Family History Information ◽

History Information ◽

Cancer Studies

Download Full-text