DiiS: A Biomedical Data Access Framework for Aiding Data Driven Research Supporting FAIR Principles

Priya Deshpande; Alexander Rasin; Jacob Furst; Daniela Raicu; Sameer Antani

doi:10.3390/data4020054

DiiS: A Biomedical Data Access Framework for Aiding Data Driven Research Supporting FAIR Principles

Data ◽

10.3390/data4020054 ◽

2019 ◽

Vol 4 (2) ◽

pp. 54 ◽

Cited By ~ 2

Author(s):

Priya Deshpande ◽

Alexander Rasin ◽

Jacob Furst ◽

Daniela Raicu ◽

Sameer Antani

Keyword(s):

Data Integration ◽

Biomedical Research ◽

Research Work ◽

Data Access ◽

Global Scale ◽

Data Driven ◽

Biomedical Data ◽

Integration Model ◽

Healthcare Data ◽

Data Integration System

Vast amounts of clinical and biomedical research data are produced daily. These data can help enable data driven healthcare through novel biomedical discoveries, improved diagnostics processes, epidemiology, and education. However, finding, and gaining access to these data and relevant metadata that are necessary to achieve these goals remains a challenge. Furthermore, data management and enabling widespread, albeit controlled, use poses a major challenge for data producers. These data sources are often geographically distributed, with diverse characteristics, and are controlled by a host of logistical and legal factors that require appropriate governance and access control guarantees. To overcome these obstacles, a set of guiding principles under the term FAIR has been previously introduced. The primary desirable dataset properties are thus that the data should be Findable, Accessible, Interoperable, and Reusable (FAIR). In this paper, we introduce and describe an abstract framework that models these ideal goals, and could be a step toward supporting data driven research. We also develop a system instantiated on our framework called the Data integration and indexing System (DiiS). The system provides an integration model for making healthcare data available on a global scale. Our research work describes the challenges inhibiting data producers, data stewards, and data brokers in achieving FAIR goals for sharing biomedical data. We attempt to address some of the key challenges through the proposed system. We evaluated our framework using the software architecture testing technique and also looked at how different challenges in data integration are addressed by our system. Our evaluation shows that the DiiS framework is a user friendly data integration system that would greatly contribute to biomedical research.

Download Full-text

Integrating Heterogeneous Biomedical Data for Cancer Research: the CARPEM infrastructure

Applied Clinical Informatics ◽

10.4338/aci-2015-09-ra-0125 ◽

2016 ◽

Vol 07 (02) ◽

pp. 260-274 ◽

Cited By ~ 18

Author(s):

Vincent Canuel ◽

Hector Countouris ◽

Pierre Laurent-Puig ◽

Anita Burgun ◽

Bastien Rance

Keyword(s):

Cancer Research ◽

Data Integration ◽

Translational Research ◽

Data Access ◽

Biomedical Data ◽

Research Platform ◽

Heterogeneous Nature ◽

Three Layer Architecture ◽

Technical Principles ◽

Integration Problems

SummaryCancer research involves numerous disciplines. The multiplicity of data sources and their heterogeneous nature render the integration and the exploration of the data more and more complex. Translational research platforms are a promising way to assist scientists in these tasks. In this article, we identify a set of scientific and technical principles needed to build a translational research platform compatible with ethical requirements, data protection and data-integration problems. We describe the solution adopted by the CARPEM cancer research program to design and deploy a platform able to integrate retrospective, prospective, and day-to-day care data. We designed a three-layer architecture composed of a data collection layer, a data integration layer and a data access layer. We leverage a set of open-source resources including i2b2 and tranSMART.Citation: Rance B, Canuel V, Countouris H, Laurent-Puig P, Burgun A. Integrating heterogeneous biomedical data for cancer research: the CARPEM infrastructure.

Download Full-text

Data-Driven Surveillance: Effective Collection, Integration, and Interpretation of Data to Support Decision Making

Frontiers in Veterinary Science ◽

10.3389/fvets.2021.633977 ◽

2021 ◽

Vol 8 ◽

Author(s):

Fernanda C. Dórea ◽

Crawford W. Revie

Keyword(s):

Decision Making ◽

Big Data ◽

Data Integration ◽

Data Access ◽

Data Sources ◽

Data Driven ◽

Multiple Data ◽

Combining Evidence ◽

Complex Variety ◽

Support Decision Making

The biggest change brought about by the “era of big data” to health in general, and epidemiology in particular, relates arguably not to the volume of data encountered, but to its variety. An increasing number of new data sources, including many not originally collected for health purposes, are now being used for epidemiological inference and contextualization. Combining evidence from multiple data sources presents significant challenges, but discussions around this subject often confuse issues of data access and privacy, with the actual technical challenges of data integration and interoperability. We review some of the opportunities for connecting data, generating information, and supporting decision-making across the increasingly complex “variety” dimension of data in population health, to enable data-driven surveillance to go beyond simple signal detection and support an expanded set of surveillance goals.

Download Full-text

Enabling Web-scale data integration in biomedicine through Linked Open Data

npj Digital Medicine ◽

10.1038/s41746-019-0162-5 ◽

2019 ◽

Vol 2 (1) ◽

Cited By ~ 3

Author(s):

Maulik R. Kamdar ◽

Javier D. Fernández ◽

Axel Polleres ◽

Tania Tudorache ◽

Mark A. Musen

Keyword(s):

Data Integration ◽

Biomedical Research ◽

Semantic Processing ◽

Open Data ◽

Heterogeneous Data ◽

Research Community ◽

Linked Open Data ◽

Biomedical Data ◽

Semantic Web Technologies ◽

The Web

Abstract The biomedical data landscape is fragmented with several isolated, heterogeneous data and knowledge sources, which use varying formats, syntaxes, schemas, and entity notations, existing on the Web. Biomedical researchers face severe logistical and technical challenges to query, integrate, analyze, and visualize data from multiple diverse sources in the context of available biomedical knowledge. Semantic Web technologies and Linked Data principles may aid toward Web-scale semantic processing and data integration in biomedicine. The biomedical research community has been one of the earliest adopters of these technologies and principles to publish data and knowledge on the Web as linked graphs and ontologies, hence creating the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we provide our perspective on some opportunities proffered by the use of LSLOD to integrate biomedical data and knowledge in three domains: (1) pharmacology, (2) cancer research, and (3) infectious diseases. We will discuss some of the major challenges that hinder the wide-spread use and consumption of LSLOD by the biomedical research community. Finally, we provide a few technical solutions and insights that can address these challenges. Eventually, LSLOD can enable the development of scalable, intelligent infrastructures that support artificial intelligence methods for augmenting human intelligence to achieve better clinical outcomes for patients, to enhance the quality of biomedical research, and to improve our understanding of living systems.

Download Full-text

Clinical Data Integration Model

Methods of Information in Medicine ◽

10.3414/me13-02-0024 ◽

2015 ◽

Vol 54 (01) ◽

pp. 16-23 ◽

Cited By ~ 18

Author(s):

V. Curcin ◽

A. Barton ◽

M. M. McGilchrist ◽

H. Bastiaens ◽

A. Andreasson ◽

...

Keyword(s):

Primary Care ◽

Data Integration ◽

Clinical Data ◽

Clinical Work ◽

Formal Ontology ◽

Basic Formal Ontology ◽

Biomedical Data ◽

Integration Model ◽

Primary Care Data ◽

Core Ontology

SummaryIntroduction: This article is part of the Focus Theme of Methods of Information in Medicine on “Managing Interoperability and Complexity in Health Systems”.Background: Primary care data is the single richest source of routine health care data. However its use, both in research and clinical work, often requires data from multiple clinical sites, clinical trials databases and registries. Data integration and interoperability are therefore of utmost importance.Objectives: TRANSFoRm’s general approach relies on a unified interoperability framework, described in a previous paper. We developed a core ontology for an interoperability framework based on data mediation. This article presents how such an ontology, the Clinical Data Integration Model (CDIM), can be designed to support, in conjunction with appropriate terminologies, biomedical data federation within TRANSFoRm, an EU FP7 project that aims to develop the digital infrastructure for a learning healthcare system in European Primary Care.Methods: TRANSFoRm utilizes a unified structural / terminological interoperability frame work, based on the local-as-view mediation paradigm. Such an approach mandates the global information model to describe the domain of interest independently of the data sources to be explored. Following a requirement analysis process, no ontology focusing on primary care research was identified and, thus we designed a realist ontology based on Basic Formal Ontology to support our framework in collaboration with various terminologies used in primary care.Results: The resulting ontology has 549 classes and 82 object properties and is used to support data integration for TRANSFoRm’s use cases. Concepts identified by researchers were successfully expressed in queries using CDIM and pertinent terminologies. As an example, we illustrate how, in TRANSFoRm, the Query Formulation Workbench can capture eligibility criteria in a computable representation, which is based on CDIM.Conclusion: A unified mediation approach to semantic interoperability provides a flexible and extensible framework for all types of interaction between health record systems and research systems. CDIM, as core ontology of such an approach, enables simplicity and consistency of design across the heterogeneous software landscape and can support the specific needs of EHR-driven phenotyping research using primary care data.

Download Full-text

Research and design of a semi-structured data-integration system on multiple Web sources

Advances in Computer Science and Technology ◽

10.2495/iccst140281 ◽

2014 ◽

Author(s):

Q. Yu ◽

Y. N. Wang

Keyword(s):

Data Integration ◽

Structured Data ◽

Integration System ◽

Data Integration System ◽

Research And Design

Download Full-text

Formalization and Semantic Integration of Heterogeneous Omics Annotations for Exploratory Searches

Current Bioinformatics ◽

10.2174/1574893615666200127122818 ◽

2020 ◽

Vol 15 ◽

Author(s):

Omer Irshad ◽

Muhammad Usman Ghani Khan

Keyword(s):

Data Integration ◽

Semantic Integration ◽

Biological Data ◽

Cellular System ◽

Formal Specifications ◽

Integration Model ◽

Geographically Dispersed ◽

Proposed Model ◽

Data Heterogeneity ◽

Biological Entities

Aim: To facilitate researchers and practitioners for unveiling the mysterious functional aspects of human cellular system through performing exploratory searching on semantically integrated heterogeneous and geographically dispersed omics annotations. Background: Improving health standards of life is one of the motives which continuously instigates researchers and practitioners to strive for uncovering the mysterious aspects of human cellular system. Inferring new knowledge from known facts always requires reasonably large amount of data in well-structured, integrated and unified form. Due to the advent of especially high throughput and sensor technologies, biological data is growing heterogeneously and geographically at astronomical rate. Several data integration systems have been deployed to cope with the issues of data heterogeneity and global dispersion. Systems based on semantic data integration models are more flexible and expandable than syntax-based ones but still lack aspect-based data integration, persistence and querying. Furthermore, these systems do not fully support to warehouse biological entities in the form of semantic associations as naturally possessed by the human cell. Objective: To develop aspect-oriented formal data integration model for semantically integrating heterogeneous and geographically dispersed omics annotations for providing exploratory querying on integrated data. Method: We propose an aspect-oriented formal data integration model which uses web semantics standards to formally specify its each construct. Proposed model supports aspect-oriented representation of biological entities while addressing the issues of data heterogeneity and global dispersion. It associates and warehouses biological entities in the way they relate with Result: To show the significance of proposed model, we developed a data warehouse and information retrieval system based on proposed model compliant multi-layered and multi-modular software architecture. Results show that our model supports well for gathering, associating, integrating, persisting and querying each entity with respect to its all possible aspects within or across the various associated omics layers. Conclusion: Formal specifications better facilitate for addressing data integration issues by providing formal means for understanding omics data based on meaning instead of syntax

Download Full-text

Trust Model for Efficient Honest Broker based Healthcare Data Access and Processing

2021 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops) ◽

10.1109/percomworkshops51409.2021.9430954 ◽

2021 ◽

Author(s):

Mauro Lemus Alarcon ◽

Minh Nguyen ◽

Saptarshi Debroy ◽

Naga Ramya Bhamidipati ◽

Prasad Calyam ◽

...

Keyword(s):

Data Access ◽

Trust Model ◽

Healthcare Data ◽

Honest Broker

Download Full-text

A Transnational Multi-cloud Distributed Monitoring Data Integration System

2020 IEEE 6th International Conference on Computer and Communications (ICCC) ◽

10.1109/iccc51575.2020.9344893 ◽

2020 ◽

Author(s):

Ming Lu ◽

Zhiyuan Nie ◽

Yatong Feng

Keyword(s):

Data Integration ◽

Monitoring Data ◽

Distributed Monitoring ◽

Integration System ◽

Data Integration System ◽

Multi Cloud

Download Full-text

Graph-based sequence annotation using a data integration approach

Journal of Integrative Bioinformatics ◽

10.1515/jib-2008-94 ◽

2008 ◽

Vol 5 (2) ◽

Cited By ~ 1

Author(s):

Robert Pesch ◽

Artem Lysenko ◽

Matthew Hindle ◽

Keywan Hassani-Pak ◽

Ralf Thiele ◽

...

Keyword(s):

Data Integration ◽

Gene Function ◽

High Throughput Sequencing ◽

Prediction Method ◽

Function Prediction ◽

Function Annotation ◽

Automated Annotation ◽

Data Integration System ◽

Reference Databases ◽

Function Assignment

SummaryThe automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional assignment. Such approaches generally ignore the links between the information in the reference datasets. These links, however, are valuable for assessing the plausibility of a function assignment and can be used to evaluate the confidence in a prediction. We are working towards a novel annotation system that uses the network of information supporting the function assignment to enrich the annotation process for use by expert curators and predicting the function of previously unannotated genes. In this paper we describe our success in the first stages of this development. We present the data integration steps that are needed to create the core database of integrated reference databases (UniProt, PFAM, PDB, GO and the pathway database Ara- Cyc) which has been established in the ONDEX data integration system. We also present a comparison between different methods for integration of GO terms as part of the function assignment pipeline and discuss the consequences of this analysis for improving the accuracy of gene function annotation.The methods and algorithms presented in this publication are an integral part of the ONDEX system which is freely available from http://ondex.sf.net/.

Download Full-text

Advancing healthcare and biomedical research via new data-driven approaches

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocx036 ◽

2017 ◽

Vol 24 (3) ◽

pp. 471-471

Author(s):

Lucila Ohno-Machado

Keyword(s):

Biomedical Research ◽

Data Driven

Download Full-text