Construction of a Digestive System Tumor Knowledge Graph Based on Chinese Electronic Medical Records: Development and Usability Study

Xiaolei Xiu; Qing Qian; Sizhu Wu

doi:10.2196/18287

Construction of a Digestive System Tumor Knowledge Graph Based on Chinese Electronic Medical Records: Development and Usability Study

JMIR Medical Informatics ◽

10.2196/18287 ◽

2020 ◽

Vol 8 (10) ◽

pp. e18287

Author(s):

Xiaolei Xiu ◽

Qing Qian ◽

Sizhu Wu

Keyword(s):

Electronic Medical Records ◽

Medical Records ◽

Digestive System ◽

Initial Step ◽

Knowledge Graph ◽

Semantic Relationships ◽

Effective Relationships ◽

System Tumor ◽

Knowledge Graphs ◽

Graph Schema

Background With the increasing incidences and mortality of digestive system tumor diseases in China, ways to use clinical experience data in Chinese electronic medical records (CEMRs) to determine potentially effective relationships between diagnosis and treatment have become a priority. As an important part of artificial intelligence, a knowledge graph is a powerful tool for information processing and knowledge organization that provides an ideal means to solve this problem. Objective This study aimed to construct a semantic-driven digestive system tumor knowledge graph (DSTKG) to represent the knowledge in CEMRs with fine granularity and semantics. Methods This paper focuses on the knowledge graph schema and semantic relationships that were the main challenges for constructing a Chinese tumor knowledge graph. The DSTKG was developed through a multistep procedure. As an initial step, a complete DSTKG construction framework based on CEMRs was proposed. Then, this research built a knowledge graph schema containing 7 classes and 16 kinds of semantic relationships and accomplished the DSTKG by knowledge extraction, named entity linking, and drawing the knowledge graph. Finally, the quality of the DSTKG was evaluated from 3 aspects: data layer, schema layer, and application layer. Results Experts agreed that the DSTKG was good overall (mean score 4.20). Especially for the aspects of “rationality of schema structure,” “scalability,” and “readability of results,” the DSTKG performed well, with scores of 4.72, 4.67, and 4.69, respectively, which were much higher than the average. However, the small amount of data in the DSTKG negatively affected its “practicability” score. Compared with other Chinese tumor knowledge graphs, the DSTKG can represent more granular entities, properties, and semantic relationships. In addition, the DSTKG was flexible, allowing personalized customization to meet the designer's focus on specific interests in the digestive system tumor. Conclusions We constructed a granular semantic DSTKG. It could provide guidance for the construction of a tumor knowledge graph and provide a preliminary step for the intelligent application of knowledge graphs based on CEMRs. Additional data sources and stronger research on assertion classification are needed to gain insight into the DSTKG’s potential.

Download Full-text

Construction of a Digestive System Tumor Knowledge Graph Based on Chinese Electronic Medical Records: Development and Usability Study (Preprint)

10.2196/preprints.18287 ◽

2020 ◽

Author(s):

Xiaolei Xiu ◽

Qing Qian ◽

Sizhu Wu

Keyword(s):

Electronic Medical Records ◽

Medical Records ◽

Digestive System ◽

Initial Step ◽

Knowledge Graph ◽

Semantic Relationships ◽

Effective Relationships ◽

System Tumor ◽

Knowledge Graphs ◽

Graph Schema

BACKGROUND With the increasing incidences and mortality of digestive system tumor diseases in China, ways to use clinical experience data in Chinese electronic medical records (CEMRs) to determine potentially effective relationships between diagnosis and treatment have become a priority. As an important part of artificial intelligence, a knowledge graph is a powerful tool for information processing and knowledge organization that provides an ideal means to solve this problem. OBJECTIVE This study aimed to construct a semantic-driven digestive system tumor knowledge graph (DSTKG) to represent the knowledge in CEMRs with fine granularity and semantics. METHODS This paper focuses on the knowledge graph schema and semantic relationships that were the main challenges for constructing a Chinese tumor knowledge graph. The DSTKG was developed through a multistep procedure. As an initial step, a complete DSTKG construction framework based on CEMRs was proposed. Then, this research built a knowledge graph schema containing 7 classes and 16 kinds of semantic relationships and accomplished the DSTKG by knowledge extraction, named entity linking, and drawing the knowledge graph. Finally, the quality of the DSTKG was evaluated from 3 aspects: data layer, schema layer, and application layer. RESULTS Experts agreed that the DSTKG was good overall (mean score 4.20). Especially for the aspects of “rationality of schema structure,” “scalability,” and “readability of results,” the DSTKG performed well, with scores of 4.72, 4.67, and 4.69, respectively, which were much higher than the average. However, the small amount of data in the DSTKG negatively affected its “practicability” score. Compared with other Chinese tumor knowledge graphs, the DSTKG can represent more granular entities, properties, and semantic relationships. In addition, the DSTKG was flexible, allowing personalized customization to meet the designer's focus on specific interests in the digestive system tumor. CONCLUSIONS We constructed a granular semantic DSTKG. It could provide guidance for the construction of a tumor knowledge graph and provide a preliminary step for the intelligent application of knowledge graphs based on CEMRs. Additional data sources and stronger research on assertion classification are needed to gain insight into the DSTKG’s potential.

Download Full-text

Demographic Aware Probabilistic Medical Knowledge Graph Embeddings of Electronic Medical Records

Artificial Intelligence in Medicine - Lecture Notes in Computer Science ◽

10.1007/978-3-030-77211-6_48 ◽

2021 ◽

pp. 408-417

Author(s):

Aynur Guluzade ◽

Endri Kacupaj ◽

Maria Maleshkova

Keyword(s):

Electronic Medical Records ◽

Medical Records ◽

Medical Knowledge ◽

Knowledge Graph ◽

Graph Embeddings

Download Full-text

PDD Graph: Bridging Electronic Medical Records and Biomedical Knowledge Graphs via Entity Linking

Lecture Notes in Computer Science - The Semantic Web – ISWC 2017 ◽

10.1007/978-3-319-68204-4_23 ◽

2017 ◽

pp. 219-227 ◽

Cited By ~ 4

Author(s):

Meng Wang ◽

Jiaheng Zhang ◽

Jun Liu ◽

Wei Hu ◽

Sen Wang ◽

...

Keyword(s):

Electronic Medical Records ◽

Medical Records ◽

Entity Linking ◽

Biomedical Knowledge ◽

Knowledge Graphs

Download Full-text

A Method to Learn Embedding of a Probabilistic Medical Knowledge Graph: Algorithm Development (Preprint)

10.2196/preprints.17645 ◽

2019 ◽

Author(s):

Linfeng Li ◽

Peng Wang ◽

Yao Wang ◽

Shenghui Wang ◽

Jun Yan ◽

...

Keyword(s):

Medical Records ◽

Large Scale ◽

Semantic Representation ◽

Medical Knowledge ◽

Mapping Function ◽

Graph Algorithm ◽

Knowledge Graph ◽

Knowledge Graphs ◽

Representation Method ◽

Better Than

BACKGROUND Knowledge graph embedding is an effective semantic representation method for entities and relations in knowledge graphs. Several translation-based algorithms, including TransE, TransH, TransR, TransD, and TranSparse, have been proposed to learn effective embedding vectors from typical knowledge graphs in which the relations between head and tail entities are deterministic. However, in medical knowledge graphs, the relations between head and tail entities are inherently probabilistic. This difference introduces a challenge in embedding medical knowledge graphs. OBJECTIVE We aimed to address the challenge of how to learn the probability values of triplets into representation vectors by making enhancements to existing TransX (where X is E, H, R, D, or Sparse) algorithms, including the following: (1) constructing a mapping function between the score value and the probability, and (2) introducing probability-based loss of triplets into the original margin-based loss function. METHODS We performed the proposed PrTransX algorithm on a medical knowledge graph that we built from large-scale real-world electronic medical records data. We evaluated the embeddings using link prediction task. RESULTS Compared with the corresponding TransX algorithms, the proposed PrTransX performed better than the TransX model in all evaluation indicators, achieving a higher proportion of corrected entities ranked in the top 10 and normalized discounted cumulative gain of the top 10 predicted tail entities, and lower mean rank. CONCLUSIONS The proposed PrTransX successfully incorporated the uncertainty of the knowledge triplets into the embedding vectors.

Download Full-text

QAnalysis: a question-answer driven analytic tool on knowledge graphs for leveraging electronic medical records for clinical research

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-019-0798-8 ◽

2019 ◽

Vol 19 (1) ◽

Cited By ~ 9

Author(s):

Tong Ruan ◽

Yueqi Huang ◽

Xuli Liu ◽

Yuhang Xia ◽

Ju Gao

Keyword(s):

Clinical Research ◽

Electronic Medical Records ◽

Medical Records ◽

Analytic Tool ◽

Knowledge Graphs

Download Full-text

Automatic Generation of a Qualified Medical Knowledge Graph and Its Usage for Retrieving Patient Cohorts from Electronic Medical Records

2013 IEEE Seventh International Conference on Semantic Computing ◽

10.1109/icsc.2013.68 ◽

2013 ◽

Cited By ~ 13

Author(s):

Travis Goodwin ◽

Sanda M. Harabagiu

Keyword(s):

Electronic Medical Records ◽

Medical Records ◽

Medical Knowledge ◽

Automatic Generation ◽

Knowledge Graph

Download Full-text

A Method to Learn Embedding of a Probabilistic Medical Knowledge Graph: Algorithm Development

JMIR Medical Informatics ◽

10.2196/17645 ◽

2020 ◽

Vol 8 (5) ◽

pp. e17645

Author(s):

Linfeng Li ◽

Peng Wang ◽

Yao Wang ◽

Shenghui Wang ◽

Jun Yan ◽

...

Keyword(s):

Medical Records ◽

Large Scale ◽

Semantic Representation ◽

Medical Knowledge ◽

Mapping Function ◽

Graph Algorithm ◽

Knowledge Graph ◽

Knowledge Graphs ◽

Representation Method ◽

Better Than

Background Knowledge graph embedding is an effective semantic representation method for entities and relations in knowledge graphs. Several translation-based algorithms, including TransE, TransH, TransR, TransD, and TranSparse, have been proposed to learn effective embedding vectors from typical knowledge graphs in which the relations between head and tail entities are deterministic. However, in medical knowledge graphs, the relations between head and tail entities are inherently probabilistic. This difference introduces a challenge in embedding medical knowledge graphs. Objective We aimed to address the challenge of how to learn the probability values of triplets into representation vectors by making enhancements to existing TransX (where X is E, H, R, D, or Sparse) algorithms, including the following: (1) constructing a mapping function between the score value and the probability, and (2) introducing probability-based loss of triplets into the original margin-based loss function. Methods We performed the proposed PrTransX algorithm on a medical knowledge graph that we built from large-scale real-world electronic medical records data. We evaluated the embeddings using link prediction task. Results Compared with the corresponding TransX algorithms, the proposed PrTransX performed better than the TransX model in all evaluation indicators, achieving a higher proportion of corrected entities ranked in the top 10 and normalized discounted cumulative gain of the top 10 predicted tail entities, and lower mean rank. Conclusions The proposed PrTransX successfully incorporated the uncertainty of the knowledge triplets into the embedding vectors.

Download Full-text

Knowledge Graph Building from Real-world Multisource “Dirty” Clinical Electronic Medical Records for Intelligent Consultation Applications

10.1109/icdh52753.2021.00049 ◽

2021 ◽

Author(s):

Xinlong Liu ◽

Li-Qun Xu

Keyword(s):

Electronic Medical Records ◽

Real World ◽

Medical Records ◽

Knowledge Graph

Download Full-text

Learning a Health Knowledge Graph from Electronic Medical Records

Scientific Reports ◽

10.1038/s41598-017-05778-z ◽

2017 ◽

Vol 7 (1) ◽

Cited By ~ 72

Author(s):

Maya Rotmensch ◽

Yoni Halpern ◽

Abdulhakim Tlimat ◽

Steven Horng ◽

David Sontag

Keyword(s):

Electronic Medical Records ◽

Medical Records ◽

Health Knowledge ◽

Knowledge Graph

Download Full-text

Adolescent Substance Use Screening Integrated With Electronic Medical Records

PsycEXTRA Dataset ◽

10.1037/e520552015-006 ◽

2014 ◽

Author(s):

C. McKenna ◽

B. Gaines ◽

C. Hatfield ◽

S. Helman ◽

L. Meyer ◽

...

Keyword(s):

Substance Use ◽

Electronic Medical Records ◽

Medical Records ◽

Adolescent Substance Use ◽

Adolescent Substance

Download Full-text