scholarly journals DiiS: A Biomedical Data Access Framework for Aiding Data Driven Research Supporting FAIR Principles

Data ◽  
2019 ◽  
Vol 4 (2) ◽  
pp. 54 ◽  
Author(s):  
Priya Deshpande ◽  
Alexander Rasin ◽  
Jacob Furst ◽  
Daniela Raicu ◽  
Sameer Antani

Vast amounts of clinical and biomedical research data are produced daily. These data can help enable data driven healthcare through novel biomedical discoveries, improved diagnostics processes, epidemiology, and education. However, finding, and gaining access to these data and relevant metadata that are necessary to achieve these goals remains a challenge. Furthermore, data management and enabling widespread, albeit controlled, use poses a major challenge for data producers. These data sources are often geographically distributed, with diverse characteristics, and are controlled by a host of logistical and legal factors that require appropriate governance and access control guarantees. To overcome these obstacles, a set of guiding principles under the term FAIR has been previously introduced. The primary desirable dataset properties are thus that the data should be Findable, Accessible, Interoperable, and Reusable (FAIR). In this paper, we introduce and describe an abstract framework that models these ideal goals, and could be a step toward supporting data driven research. We also develop a system instantiated on our framework called the Data integration and indexing System (DiiS). The system provides an integration model for making healthcare data available on a global scale. Our research work describes the challenges inhibiting data producers, data stewards, and data brokers in achieving FAIR goals for sharing biomedical data. We attempt to address some of the key challenges through the proposed system. We evaluated our framework using the software architecture testing technique and also looked at how different challenges in data integration are addressed by our system. Our evaluation shows that the DiiS framework is a user friendly data integration system that would greatly contribute to biomedical research.

2016 ◽  
Vol 07 (02) ◽  
pp. 260-274 ◽  
Author(s):  
Vincent Canuel ◽  
Hector Countouris ◽  
Pierre Laurent-Puig ◽  
Anita Burgun ◽  
Bastien Rance

SummaryCancer research involves numerous disciplines. The multiplicity of data sources and their heterogeneous nature render the integration and the exploration of the data more and more complex. Translational research platforms are a promising way to assist scientists in these tasks. In this article, we identify a set of scientific and technical principles needed to build a translational research platform compatible with ethical requirements, data protection and data-integration problems. We describe the solution adopted by the CARPEM cancer research program to design and deploy a platform able to integrate retrospective, prospective, and day-to-day care data. We designed a three-layer architecture composed of a data collection layer, a data integration layer and a data access layer. We leverage a set of open-source resources including i2b2 and tranSMART.Citation: Rance B, Canuel V, Countouris H, Laurent-Puig P, Burgun A. Integrating heterogeneous biomedical data for cancer research: the CARPEM infrastructure.


2021 ◽  
Vol 8 ◽  
Author(s):  
Fernanda C. Dórea ◽  
Crawford W. Revie

The biggest change brought about by the “era of big data” to health in general, and epidemiology in particular, relates arguably not to the volume of data encountered, but to its variety. An increasing number of new data sources, including many not originally collected for health purposes, are now being used for epidemiological inference and contextualization. Combining evidence from multiple data sources presents significant challenges, but discussions around this subject often confuse issues of data access and privacy, with the actual technical challenges of data integration and interoperability. We review some of the opportunities for connecting data, generating information, and supporting decision-making across the increasingly complex “variety” dimension of data in population health, to enable data-driven surveillance to go beyond simple signal detection and support an expanded set of surveillance goals.


2019 ◽  
Vol 2 (1) ◽  
Author(s):  
Maulik R. Kamdar ◽  
Javier D. Fernández ◽  
Axel Polleres ◽  
Tania Tudorache ◽  
Mark A. Musen

Abstract The biomedical data landscape is fragmented with several isolated, heterogeneous data and knowledge sources, which use varying formats, syntaxes, schemas, and entity notations, existing on the Web. Biomedical researchers face severe logistical and technical challenges to query, integrate, analyze, and visualize data from multiple diverse sources in the context of available biomedical knowledge. Semantic Web technologies and Linked Data principles may aid toward Web-scale semantic processing and data integration in biomedicine. The biomedical research community has been one of the earliest adopters of these technologies and principles to publish data and knowledge on the Web as linked graphs and ontologies, hence creating the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we provide our perspective on some opportunities proffered by the use of LSLOD to integrate biomedical data and knowledge in three domains: (1) pharmacology, (2) cancer research, and (3) infectious diseases. We will discuss some of the major challenges that hinder the wide-spread use and consumption of LSLOD by the biomedical research community. Finally, we provide a few technical solutions and insights that can address these challenges. Eventually, LSLOD can enable the development of scalable, intelligent infrastructures that support artificial intelligence methods for augmenting human intelligence to achieve better clinical outcomes for patients, to enhance the quality of biomedical research, and to improve our understanding of living systems.


2015 ◽  
Vol 54 (01) ◽  
pp. 16-23 ◽  
Author(s):  
V. Curcin ◽  
A. Barton ◽  
M. M. McGilchrist ◽  
H. Bastiaens ◽  
A. Andreasson ◽  
...  

SummaryIntroduction: This article is part of the Focus Theme of Methods of Information in Medicine on “Managing Interoperability and Complexity in Health Systems”.Background: Primary care data is the single richest source of routine health care data. However its use, both in research and clinical work, often requires data from multiple clinical sites, clinical trials databases and registries. Data integration and interoperability are therefore of utmost importance.Objectives: TRANSFoRm’s general approach relies on a unified interoperability framework, described in a previous paper. We developed a core ontology for an interoperability framework based on data mediation. This article presents how such an ontology, the Clinical Data Integration Model (CDIM), can be designed to support, in conjunction with appropriate terminologies, biomedical data federation within TRANSFoRm, an EU FP7 project that aims to develop the digital infrastructure for a learning healthcare system in European Primary Care.Methods: TRANSFoRm utilizes a unified structural / terminological interoperability frame work, based on the local-as-view mediation paradigm. Such an approach mandates the global information model to describe the domain of interest independently of the data sources to be explored. Following a requirement analysis process, no ontology focusing on primary care research was identified and, thus we designed a realist ontology based on Basic Formal Ontology to support our framework in collaboration with various terminologies used in primary care.Results: The resulting ontology has 549 classes and 82 object properties and is used to support data integration for TRANSFoRm’s use cases. Concepts identified by researchers were successfully expressed in queries using CDIM and pertinent terminologies. As an example, we illustrate how, in TRANSFoRm, the Query Formulation Workbench can capture eligibility criteria in a computable representation, which is based on CDIM.Conclusion: A unified mediation approach to semantic interoperability provides a flexible and extensible framework for all types of interaction between health record systems and research systems. CDIM, as core ontology of such an approach, enables simplicity and consistency of design across the heterogeneous software landscape and can support the specific needs of EHR-driven phenotyping research using primary care data.


2020 ◽  
Vol 15 ◽  
Author(s):  
Omer Irshad ◽  
Muhammad Usman Ghani Khan

Aim: To facilitate researchers and practitioners for unveiling the mysterious functional aspects of human cellular system through performing exploratory searching on semantically integrated heterogeneous and geographically dispersed omics annotations. Background: Improving health standards of life is one of the motives which continuously instigates researchers and practitioners to strive for uncovering the mysterious aspects of human cellular system. Inferring new knowledge from known facts always requires reasonably large amount of data in well-structured, integrated and unified form. Due to the advent of especially high throughput and sensor technologies, biological data is growing heterogeneously and geographically at astronomical rate. Several data integration systems have been deployed to cope with the issues of data heterogeneity and global dispersion. Systems based on semantic data integration models are more flexible and expandable than syntax-based ones but still lack aspect-based data integration, persistence and querying. Furthermore, these systems do not fully support to warehouse biological entities in the form of semantic associations as naturally possessed by the human cell. Objective: To develop aspect-oriented formal data integration model for semantically integrating heterogeneous and geographically dispersed omics annotations for providing exploratory querying on integrated data. Method: We propose an aspect-oriented formal data integration model which uses web semantics standards to formally specify its each construct. Proposed model supports aspect-oriented representation of biological entities while addressing the issues of data heterogeneity and global dispersion. It associates and warehouses biological entities in the way they relate with Result: To show the significance of proposed model, we developed a data warehouse and information retrieval system based on proposed model compliant multi-layered and multi-modular software architecture. Results show that our model supports well for gathering, associating, integrating, persisting and querying each entity with respect to its all possible aspects within or across the various associated omics layers. Conclusion: Formal specifications better facilitate for addressing data integration issues by providing formal means for understanding omics data based on meaning instead of syntax


2008 ◽  
Vol 5 (2) ◽  
Author(s):  
Robert Pesch ◽  
Artem Lysenko ◽  
Matthew Hindle ◽  
Keywan Hassani-Pak ◽  
Ralf Thiele ◽  
...  

SummaryThe automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional assignment. Such approaches generally ignore the links between the information in the reference datasets. These links, however, are valuable for assessing the plausibility of a function assignment and can be used to evaluate the confidence in a prediction. We are working towards a novel annotation system that uses the network of information supporting the function assignment to enrich the annotation process for use by expert curators and predicting the function of previously unannotated genes. In this paper we describe our success in the first stages of this development. We present the data integration steps that are needed to create the core database of integrated reference databases (UniProt, PFAM, PDB, GO and the pathway database Ara- Cyc) which has been established in the ONDEX data integration system. We also present a comparison between different methods for integration of GO terms as part of the function assignment pipeline and discuss the consequences of this analysis for improving the accuracy of gene function annotation.The methods and algorithms presented in this publication are an integral part of the ONDEX system which is freely available from http://ondex.sf.net/.


Sign in / Sign up

Export Citation Format

Share Document