Patient Representation Learning from Heterogeneous Data Sources and Knowledge Graphs using Deep Collective Matrix Factorization: Evaluation Study (Preprint)

10.2196/preprints.28842 ◽

2021 ◽

Author(s):

Sajit Kumar ◽

Alicia Nanelia Tan Li Shi ◽

Ragunathan Mariappan ◽

Adithya Rajagopal ◽

Vaibhav Rajan

Keyword(s):

Decision Support ◽

Clinical Decision Support ◽

Matrix Factorization ◽

Representation Learning ◽

Clinical Decision ◽

Heterogeneous Data ◽

Biomedical Literature ◽

Data Sources ◽

Heterogeneous Data Sources ◽

Knowledge Graphs

BACKGROUND Patient Representation Learning aims to learn features, also called representations, from input sources automatically, often in an unsupervised manner, for use in predictive models. This obviates the need for cumbersome, time- and resource-intensive manual feature engineering, especially from unstructured data such as text, images or graphs. Most previous techniques have used neural network based autoencoders to learn patient representations, primarily from clinical notes in Electronic Medical Records (EMR). Knowledge Graphs (KG), with clinical entities as nodes and their relations as edges, can be extracted automatically from biomedical literature, and provide complementary information to EMR data that have been found to provide valuable predictive signals. OBJECTIVE We evaluate the efficacy of Collective Matrix Factorization (CMF) - both classical variants and a recent neural architecture called Deep CMF (DCMF) - in integrating heterogeneous data sources from EMR and KG to obtain patient representations for Clinical Decision Support Tasks. METHODS Using a recent formulation of obtaining graph representations through matrix factorization, within the context of CMF, we infuse auxiliary information during patient representation learning. We also extend the DCMF architecture to create a task-specific end-to-end model that learns to simultaneously find effective patient representations and predict. We compare the efficacy of such a model to that of first learning unsupervised representations and then independently learning a predictive model. We evaluate patient representation learning using CMF-based methods and autoencoders for two clinical decision support tasks on a large EMR dataset. RESULTS Our experiments show that DCMF provides a seamless way to integrate multiple sources of data to obtain patient representations, both in unsupervised and supervised settings. Its performance in single-source settings is comparable to that of previous autoencoder-based representation learning methods. When DCMF is used to obtain representations from a combination of EMR and KG, where most previous autoencoder-based methods cannot be used directly, its performance is superior to that of previous non-neural methods for CMF. Infusing information from KGs into patient representations using DCMF was found to improve downstream predictive performance. CONCLUSIONS Our experiments indicate that DCMF is a versatile model that can be used to obtain representations from single and multiple data sources, and to combine information from EMR data and Knowledge Graphs. Further, DCMF can be used to learn representations in both supervised and unsupervised settings. Thus, DCMF offers an effective way of integrating heterogeneous data sources and infusing auxiliary knowledge into patient representations.

Download Full-text

Information Credibility Assessment and Meta Data Modeling in Integrating Heterogeneous Data Sources

10.21236/ada409695 ◽

2002 ◽

Cited By ~ 1

Author(s):

Peter P. Chen

Keyword(s):

Data Modeling ◽

Heterogeneous Data ◽

Data Sources ◽

Credibility Assessment ◽

Meta Data ◽

Heterogeneous Data Sources ◽

Information Credibility

Download Full-text

Methodology of Big Data Integration from A Priori Unknown Heterogeneous Data Sources

Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence - CSAI '18 ◽

10.1145/3297156.3297249 ◽

2018 ◽

Author(s):

Alexey Samoylov ◽

Nikolay Sergeev ◽

Margarita Kucherova ◽

Boris Denisov

Keyword(s):

Big Data ◽

Data Integration ◽

A Priori ◽

Heterogeneous Data ◽

Data Sources ◽

Heterogeneous Data Sources

Download Full-text

Matching disparate dimensions for analytical integration of heterogeneous data sources

Proceedings of the 11th International Conference on Management of Digital EcoSystems ◽

10.1145/3297662.3365809 ◽

2019 ◽

Author(s):

Anna Korobko ◽

Aleksei Korobko

Keyword(s):

Heterogeneous Data ◽

Data Sources ◽

Heterogeneous Data Sources ◽

Analytical Integration

Download Full-text

A multi-agent conversational system with heterogeneous data sources access

Expert Systems with Applications ◽

10.1016/j.eswa.2016.01.033 ◽

2016 ◽

Vol 53 ◽

pp. 172-191 ◽

Cited By ~ 9

Author(s):

Eduardo M. Eisman ◽

María Navarro ◽

Juan Luis Castro

Keyword(s):

Heterogeneous Data ◽

Data Sources ◽

Heterogeneous Data Sources ◽

Multi Agent

Download Full-text

Matching and integration across heterogeneous data sources

10.1145/1146598.1146738 ◽

2006 ◽

Cited By ~ 2

Author(s):

Patrick Pantel ◽

Andrew Philpot ◽

Eduard Hovy

Keyword(s):

Heterogeneous Data ◽

Data Sources ◽

Heterogeneous Data Sources

Download Full-text

Semantic integration of traditional and heterogeneous data sources (UML, XML and RDB) in OWL2 triplestore

International Journal of Data Analysis Techniques and Strategies ◽

10.1504/ijdats.2021.10037314 ◽

2021 ◽

Vol 13 (1/2) ◽

pp. 36

Author(s):

Larbi Alaoui ◽

Mohamed Bahaj ◽

Oussama El Hajjamy ◽

Hajar Khallouki

Keyword(s):

Heterogeneous Data ◽

Semantic Integration ◽

Data Sources ◽

Heterogeneous Data Sources

Download Full-text

Prediction of RNA subcellular localization: learning from heterogeneous data sources

iScience ◽

10.1016/j.isci.2021.103298 ◽

2021 ◽

pp. 103298

Author(s):

Anca Flavia Savulescu ◽

Emmanuel Bouilhol ◽

Nicolas Beaume ◽

Macha Nikolski

Keyword(s):

Subcellular Localization ◽

Heterogeneous Data ◽

Data Sources ◽

Heterogeneous Data Sources

Download Full-text

Learning from heterogeneous data sources: an application in spatial proteomics

10.1101/022152 ◽

2015 ◽

Cited By ~ 1

Author(s):

Lisa M. Breckels ◽

Sean Holden ◽

David Wojnar ◽

Claire M. Mulvey ◽

Andy Christoforou ◽

...

Keyword(s):

Mass Spectrometry ◽

Support Vector Machine ◽

Transfer Learning ◽

High Throughput ◽

Cell Biology ◽

Heterogeneous Data ◽

Data Sources ◽

Support Vector ◽

Proteomics Data ◽

Heterogeneous Data Sources

AbstractSub-cellular localisation of proteins is an essential post-translational regulatory mechanism that can be assayed using high-throughput mass spectrometry (MS). These MS-based spatial proteomics experiments enable us to pinpoint the sub-cellular distribution of thousands of proteins in a specific system under controlled conditions. Recent advances in high-throughput MS methods have yielded a plethora of experimental spatial proteomics data for the cell biology community. Yet, there are many third-party data sources, such as immunofluorescence microscopy or protein annotations and sequences, which represent a rich and vast source of complementary information. We present a unique transfer learning classification framework that utilises a nearest-neighbour or support vector machine system, to integrate heterogeneous data sources to considerably improve on the quantity and quality of sub-cellular protein assignment. We demonstrate the utility of our algorithms through evaluation of five experimental datasets, from four different species in conjunction with four different auxiliary data sources to classify proteins to tens of sub-cellular compartments with high generalisation accuracy. We further apply the method to an experiment on pluripotent mouse embryonic stem cells to classify a set of previously unknown proteins, and validate our findings against a recent high resolution map of the mouse stem cell proteome. The methodology is distributed as part of the open-source Bioconductor pRoloc suite for spatial proteomics data analysis.AbbreviationsLOPITLocalisation of Organelle Proteins by Isotope TaggingPCPProtein Correlation ProfilingMLMachine learningTLTransfer learningSVMSupport vector machinePCAPrincipal component analysisGOGene OntologyCCCellular compartmentiTRAQIsobaric tags for relative and absolute quantitationTMTTandem mass tagsMSMass spectrometry

Download Full-text

An Iterative Automatic Final Alignment Method in the Ontology Matching System

Journal of information and organizational sciences ◽

10.31341/jios.42.1.3 ◽

2018 ◽

Vol 42 (1) ◽

pp. 39-61 ◽

Cited By ~ 1

Author(s):

Marko Gulić ◽

Marin Vuković

Keyword(s):

Heterogeneous Data ◽

Data Sources ◽

Ontology Matching ◽

Alignment Method ◽

Automatic Adjustment ◽

Matching Process ◽

Heterogeneous Data Sources ◽

Final Alignment

Ontology matching plays an important role in the integration of heterogeneous data sources that are described by ontologies. In order to determine correspondences between ontologies, a set of matchers can be used. After the execution of these matchers and the aggregation of the results obtained by these matchers, a final alignment method is executed in order to select appropriate correspondences between entities of compared ontologies. The final alignment method is an important part of the ontology matching process because it directly determines the output result of this process. In this paper we improve our iterative final alignment method by introducing an automatic adjustment of final alignment threshold as well as a new rule for determining false correspondences with similarity values greater than adjusted threshold. An evaluation of the method is performed on the test ontologies of the OAEI evaluation contest and a comparison with other final alignment methods is given.

Download Full-text