A Review of Automatic Phenotyping Approaches using Electronic Health Records

Hadeel Alzoubi; Raid Alzubi; Naeem Ramzan; Daune West; Tawfik Al-Hadhrami; Mamoun Alazab

doi:10.3390/electronics8111235

A Review of Automatic Phenotyping Approaches using Electronic Health Records

Electronics ◽

10.3390/electronics8111235 ◽

2019 ◽

Vol 8 (11) ◽

pp. 1235 ◽

Cited By ~ 2

Author(s):

Hadeel Alzoubi ◽

Raid Alzubi ◽

Naeem Ramzan ◽

Daune West ◽

Tawfik Al-Hadhrami ◽

...

Keyword(s):

Electronic Health Records ◽

Language Processing ◽

Critical Evaluation ◽

Clinical Information ◽

Machine Learning Techniques ◽

Observational Research ◽

Medical Subject Headings ◽

Health Records ◽

Electronic Health ◽

Automatic Phenotyping

Electronic Health Records (EHR) are a rich repository of valuable clinical information that exist in primary and secondary care databases. In order to utilize EHRs for medical observational research a range of algorithms for automatically identifying individuals with a specific phenotype have been developed. This review summarizes and offers a critical evaluation of the literature relating to studies conducted into the development of EHR phenotyping systems. This review describes phenotyping systems and techniques based on structured and unstructured EHR data. Articles published on PubMed and Google scholar between 2013 and 2017 have been reviewed, using search terms derived from Medical Subject Headings (MeSH). The popularity of using Natural Language Processing (NLP) techniques in extracting features from narrative text has increased. This increased attention is due to the availability of open source NLP algorithms, combined with accuracy improvement. In this review, Concept extraction is the most popular NLP technique since it has been used by more than 50% of the reviewed papers to extract features from EHR. High-throughput phenotyping systems using unsupervised machine learning techniques have gained more popularity due to their ability to efficiently and automatically extract a phenotype with minimal human effort.

Download Full-text

Development of algorithm for classification smoking status from unstructured bilingual electronic health records based on natural language processing (Preprint)

10.2196/preprints.26978 ◽

2021 ◽

Author(s):

Ye Seul Bae ◽

Kyung Hwan Kim ◽

Han Kyul Kim ◽

Sae Won Choi ◽

Taehoon Ko ◽

...

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Smoking Status ◽

Svm Classifier ◽

Keyword Extraction ◽

Health Records ◽

Clinical Notes ◽

Electronic Health

BACKGROUND Smoking is a major risk factor and important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bilingual electronic health records (EHR). OBJECTIVE We aim to develop an algorithm to classify smoking status based on unstructured EHRs using natural language processing (NLP). METHODS With acronym replacement and Python package Soynlp, we normalize 4,711 bilingual clinical notes. Each EHR notes was classified into 4 categories: current smokers, past smokers, never smokers, and unknown. Subsequently, SPPMI (Shifted Positive Point Mutual Information) is used to vectorize words in the notes. By calculating cosine similarity between these word vectors, keywords denoting the same smoking status are identified. RESULTS Compared to other keyword extraction methods (word co-occurrence-, PMI-, and NPMI-based methods), our proposed approach improves keyword extraction precision by as much as 20.0%. These extracted keywords are used in classifying 4 smoking statuses from our bilingual clinical notes. Given an identical SVM classifier, the extracted keywords improve the F1 score by as much as 1.8% compared to those of the unigram and bigram Bag of Words. CONCLUSIONS Our study shows the potential of SPPMI in classifying smoking status from bilingual, unstructured EHRs. Our current findings show how smoking information can be easily acquired and used for clinical practice and research.

Download Full-text

Abstract PO-050: Identifying de novo stage IV breast cancer (DNIV) cases in Electronic Health Records (EHR) using natural language processing

10.1158/1557-3265.adi21-po-050 ◽

2021 ◽

Author(s):

Liwei Wang ◽

Karthik Giridhar ◽

Kimberly Corbin ◽

Brenda Ernst ◽

Sadia Choudhery ◽

...

Keyword(s):

Breast Cancer ◽

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

De Novo ◽

Stage Iv ◽

Health Records ◽

Stage Iv Breast Cancer ◽

Electronic Health

Download Full-text

Desiderata for computable representations of electronic health records-driven phenotype algorithms

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocv112 ◽

2015 ◽

Vol 22 (6) ◽

pp. 1220-1230 ◽

Cited By ~ 28

Author(s):

Huan Mo ◽

William K Thompson ◽

Luke V Rasmussen ◽

Jennifer A Pacheco ◽

Guoqian Jiang ◽

...

Keyword(s):

Electronic Health Records ◽

Language Processing ◽

Clinical Decision Making ◽

Clinical Decision ◽

Relational Algebra ◽

Common Data Model ◽

Health Records ◽

Electronic Health ◽

Value Sets ◽

Text Searching

Abstract Background Electronic health records (EHRs) are increasingly used for clinical and translational research through the creation of phenotype algorithms. Currently, phenotype algorithms are most commonly represented as noncomputable descriptive documents and knowledge artifacts that detail the protocols for querying diagnoses, symptoms, procedures, medications, and/or text-driven medical concepts, and are primarily meant for human comprehension. We present desiderata for developing a computable phenotype representation model (PheRM). Methods A team of clinicians and informaticians reviewed common features for multisite phenotype algorithms published in PheKB.org and existing phenotype representation platforms. We also evaluated well-known diagnostic criteria and clinical decision-making guidelines to encompass a broader category of algorithms. Results We propose 10 desired characteristics for a flexible, computable PheRM: (1) structure clinical data into queryable forms; (2) recommend use of a common data model, but also support customization for the variability and availability of EHR data among sites; (3) support both human-readable and computable representations of phenotype algorithms; (4) implement set operations and relational algebra for modeling phenotype algorithms; (5) represent phenotype criteria with structured rules; (6) support defining temporal relations between events; (7) use standardized terminologies and ontologies, and facilitate reuse of value sets; (8) define representations for text searching and natural language processing; (9) provide interfaces for external software algorithms; and (10) maintain backward compatibility. Conclusion A computable PheRM is needed for true phenotype portability and reliability across different EHR products and healthcare systems. These desiderata are a guide to inform the establishment and evolution of EHR phenotype algorithm authoring platforms and languages.

Download Full-text

Methods for Enhancing the Reproducibility of Observational Research Using Electronic Health Records: Preliminary Findings from the CALIBER Resource

2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS) ◽

10.1109/cbms.2017.74 ◽

2017 ◽

Cited By ~ 1

Author(s):

Spiros Denaxas ◽

Arturo Gonzalez-Izquierdo ◽

Maria Pikoula ◽

Kenan Direk ◽

Natalie Fitzpatrick ◽

...

Keyword(s):

Electronic Health Records ◽

Observational Research ◽

Health Records ◽

Electronic Health

Download Full-text

Correction: Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis

JMIR Medical Informatics ◽

10.2196/30153 ◽

2021 ◽

Vol 9 (5) ◽

pp. e30153

Author(s):

Maciej Rybinski ◽

Xiang Dai ◽

Sonit Singh ◽

Sarvnaz Karimi ◽

Anthony Nguyen

Keyword(s):

Natural Language Processing ◽

Family History ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Family History Information ◽

Health Records ◽

History Information ◽

Electronic Health

Download Full-text

Challenges of Developing a Natural Language Processing Method With Electronic Health Records to Identify Persons With Chronic Mobility Disability

Archives of Physical Medicine and Rehabilitation ◽

10.1016/j.apmr.2020.04.024 ◽

2020 ◽

Vol 101 (10) ◽

pp. 1739-1746 ◽

Cited By ~ 3

Author(s):

Nicole D. Agaronnik ◽

Charlotta Lindvall ◽

Areej El-Jawahri ◽

Wei He ◽

Lisa I. Iezzoni

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Processing Method ◽

Mobility Disability ◽

Health Records ◽

Electronic Health

Download Full-text

Tu1031 Natural Language Processing of Electronic Health Records Accurately Identifies Right Colon Hyperplastic Polyps for Potential Surveillance Reclassification

Gastroenterology ◽

10.1016/s0016-5085(14)62654-8 ◽

2014 ◽

Vol 146 (5) ◽

pp. S-732

Author(s):

Meena A. Prasad ◽

William Thompson ◽

Rajesh N. Keswani ◽

Abel Kho ◽

Ikuo Hirano ◽

...

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Hyperplastic Polyps ◽

Right Colon ◽

Health Records ◽

Electronic Health

Download Full-text

Realizing the full potential of electronic health records: the role of natural language processing

Journal of the American Medical Informatics Association ◽

10.1136/amiajnl-2011-000501 ◽

2011 ◽

Vol 18 (5) ◽

pp. 539-539 ◽

Cited By ~ 31

Author(s):

Lucila Ohno-Machado

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Full Potential ◽

Health Records ◽

Electronic Health

Download Full-text

Propuesta metodológica de intercambio electrónico de información clínica basada en estándares de telemedicina/Methodological proposal of electronic data interchange and clinical information based on telemedicine standards

KnE Engineering ◽

10.18502/keg.v5i2.6237 ◽

2020 ◽

Author(s):

P. Moreno ◽

G. Bastidas ◽

P. Moreno

Keyword(s):

Electronic Health Records ◽

Information And Communication Technologies ◽

Web Application ◽

Medical Information ◽

Reference Model ◽

Clinical Information ◽

Electronic Data Interchange ◽

Records Management ◽

Health Records ◽

Electronic Health

El avance de las tecnologías de la información ha permitido un cambio sustancial en el desarrollo de la Salud, por lo que el uso de estándares de telemedicina como el HL7 y CEN TC 251-13606 permiten que los sistemas de información médica se comuniquen vía mensajes estandarizados facilitando el uso de los mismos. El propósito de este estudio es crear una guía metodológica de intercambio electrónico de información clínica basada en el análisis de los estándares de telemedicina HL7 y CEN TC 251- 13606 para mejorar la eficiencia de la gestión de Historias Clínicas de los pacientes. La metodología consta de 2 fases, la primera plantea el diseño e implementación del modelo de referencia de la Historia Clínica Electrónica, el mismo que define entidades necesarias en la construcción de una Historia Clínica Electrónica, en la fase 2 se define la arquitectura de la historia clínica especificando la estructura y semántica del documento mediante el lenguaje XML, el cual se utiliza en los procesos de gestión de las historias clínicas electrónicas dentro del sistema médico desarrollado. Este sistema permite control clínico a distancia facilitando la interacción médico-paciente. El sistema posee una aplicación web, una aplicación de escritorio y una plataforma hardware e- Salud. La aplicación de la metodología planteada mejora la eficiencia de la gestión de historias clínicas, puesto que el 83.32% de los médicos de la clínica consideran que se agiliza el proceso de acceso, creación e ingreso de historias clínicas y reduce recursos en el proceso de control de pacientes domiciliarios. The advance of Information and Communication Technologies has improved Health Care in last years; by providing new ways of accessing medical information. In particular, the use of telemedicine standards such as HL7 and CENTC 251-13606 allows standard communication, integration, and retrieval of electronic health records among medical systems. This article aims to create a methodological guide for the electronic exchange of clinical information based on telemedicine standards in order to improve the efficiency of electronic health records management. The proposed methodology consists of two phases: The first one states the design and implementation of the reference model of an electronic health records, which defines entities of the electronic health record. In phase 2, this methodology describes electronic health records architecture. The architecture is defined by the structure and semantics of the document using XML. In order to test the proposed methodology, a medical system was implemented that consists of a web application, desktop application, and hardware platform e- Health. This system allows the electronic exchange of clinical information to ease patient-doctor interaction. The results show 83,32% of doctors at the clinic where the system was tested agree the proposed methodology for electronic exchange improves the efficiency of electronic health records management since it speeds up the process of creation and retrieval of an electronic health records. Moreover, the system reduces resources in the control of home patients. Palabras clave: Telemedicina, HCE, HL7, CENTC 251-13606, e-Salud. Keywords: Telemedicine, EHR, HL7, CENTC 251-13606, e-Health.

Download Full-text

Keyword Extraction Algorithm for Classifying Smoking Status from Unstructured Bilingual Electronic Health Records Based on Natural Language Processing

Applied Sciences ◽

10.3390/app11198812 ◽

2021 ◽

Vol 11 (19) ◽

pp. 8812

Author(s):

Ye Seul Bae ◽

Kyung Hwan Kim ◽

Han Kyul Kim ◽

Sae Won Choi ◽

Taehoon Ko ◽

...

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Smoking Status ◽

Extraction Methods ◽

Svm Classifier ◽

Keyword Extraction ◽

Health Records ◽

Electronic Health

Smoking is an important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bilingual electronic health records (EHR). We aim to develop an algorithm to classify smoking status based on unstructured EHRs using natural language processing (NLP). With acronym replacement and Python package Soynlp, we normalize 4711 bilingual clinical notes. Each EHR notes was classified into 4 categories: current smokers, past smokers, never smokers, and unknown. Subsequently, SPPMI (Shifted Positive Point Mutual Information) is used to vectorize words in the notes. By calculating cosine similarity between these word vectors, keywords denoting the same smoking status are identified. Compared to other keyword extraction methods (word co-occurrence-, PMI-, and NPMI-based methods), our proposed approach improves keyword extraction precision by as much as 20.0%. These extracted keywords are used in classifying 4 smoking statuses from our bilingual EHRs. Given an identical SVM classifier, the F1 score is improved by as much as 1.8% compared to those of the unigram and bigram Bag of Words. Our study shows the potential of SPPMI in classifying smoking status from bilingual, unstructured EHRs. Our current findings show how smoking information can be easily acquired for clinical practice and research.

Download Full-text