The PlaNet Consortium: A Network of European Plant Databases Connecting Plant Genome Data in an Integrated Biological Knowledge Resource

H. Schoof; R. Ernst; K. F. X. Mayer

doi:10.1002/cfg.374

The PlaNet Consortium: A Network of European Plant Databases Connecting Plant Genome Data in an Integrated Biological Knowledge Resource

Comparative and Functional Genomics ◽

10.1002/cfg.374 ◽

2004 ◽

Vol 5 (2) ◽

pp. 184-189 ◽

Cited By ~ 3

Author(s):

H. Schoof ◽

R. Ernst ◽

K. F. X. Mayer

Keyword(s):

Data Exchange ◽

Data Representation ◽

Data Models ◽

Biological Data ◽

Plant Genome ◽

Data Sources ◽

Direct Access ◽

Biological Knowledge ◽

Database Integration ◽

Complex Data

The completion of theArabidopsisgenome and the large collections of other plant sequences generated in recent years have sparked extensive functional genomics efforts. However, the utilization of this data is inefficient, as data sources are distributed and heterogeneous and efforts at data integration are lagging behind. PlaNet aims to overcome the limitations of individual efforts as well as the limitations of heterogeneous, independent data collections. PlaNet is a distributed effort among European bioinformatics groups and plant molecular biologists to establish a comprehensive integrated database in a collaborative network. Objectives are the implementation of infrastructure and data sources to capture plant genomic information into a comprehensive, integrated platform. This will facilitate the systematic exploration ofArabidopsisand other plants. New methods for data exchange, database integration and access are being developed to create a highly integrated, federated data resource for research. The connection between the individual resources is realized with BioMOBY. BioMOBY provides an architecture for the discovery and distribution of biological data through web services. While knowledge is centralized, data is maintained at its primary source without a need for warehousing. To standardize nomenclature and data representation, ontologies and generic data models are defined in interaction with the relevant communities.Minimal data models should make it simple to allow broad integration, while inheritance allows detail and depth to be added to more complex data objects without losing integration. To allow expert annotation and keep databases curated, local and remote annotation interfaces are provided. Easy and direct access to all data is key to the project.

Download Full-text

Enabling semantic queries across federated bioinformatics databases

Database ◽

10.1093/database/baz106 ◽

2019 ◽

Vol 2019 ◽

Cited By ~ 9

Author(s):

Ana Claudia Sima ◽

Tarcisio Mendes de Farias ◽

Erich Zbinden ◽

Maria Anisimova ◽

Manuel Gil ◽

...

Keyword(s):

Gene Expression ◽

Data Integration ◽

Heterogeneous Data ◽

Biological Data ◽

Data Sources ◽

Biological Knowledge ◽

Biological Databases ◽

Semantic Level ◽

Sparql Endpoint ◽

Description Framework

Abstract Motivation: Data integration promises to be one of the main catalysts in enabling new insights to be drawn from the wealth of biological data available publicly. However, the heterogeneity of the different data sources, both at the syntactic and the semantic level, still poses significant challenges for achieving interoperability among biological databases. Results: We introduce an ontology-based federated approach for data integration. We applied this approach to three heterogeneous data stores that span different areas of biological knowledge: (i) Bgee, a gene expression relational database; (ii) Orthologous Matrix (OMA), a Hierarchical Data Format 5 orthology DS; and (iii) UniProtKB, a Resource Description Framework (RDF) store containing protein sequence and functional information. To enable federated queries across these sources, we first defined a new semantic model for gene expression called GenEx. We then show how the relational data in Bgee can be expressed as a virtual RDF graph, instantiating GenEx, through dedicated relational-to-RDF mappings. By applying these mappings, Bgee data are now accessible through a public SPARQL endpoint. Similarly, the materialized RDF data of OMA, expressed in terms of the Orthology ontology, is made available in a public SPARQL endpoint. We identified and formally described intersection points (i.e. virtual links) among the three data sources. These allow performing joint queries across the data stores. Finally, we lay the groundwork to enable nontechnical users to benefit from the integrated data, by providing a natural language template-based search interface.

Download Full-text

The status of causality in biological databases: data resources and data retrieval possibilities to support logical modeling

Briefings in Bioinformatics ◽

10.1093/bib/bbaa390 ◽

2020 ◽

Author(s):

Vasundra Touré ◽

Åsmund Flobak ◽

Anna Niarakis ◽

Steven Vercruysse ◽

Martin Kuiper

Keyword(s):

Molecular Interactions ◽

Regulatory Networks ◽

Data Exchange ◽

Contextual Information ◽

Data Retrieval ◽

Building Blocks ◽

Data Representation ◽

Biological Knowledge ◽

Biological Databases ◽

Knowledge Resources

Abstract Causal molecular interactions represent key building blocks used in computational modeling, where they facilitate the assembly of regulatory networks. Logical regulatory networks can be used to predict biological and cellular behaviors by system perturbations and in silico simulations. Today, broad sets of causal interactions are available in a variety of biological knowledge resources. However, different visions, based on distinct biological interests, have led to the development of multiple ways to describe and annotate causal molecular interactions. It can therefore be challenging to efficiently explore various resources of causal interaction and maintain an overview of recorded contextual information that ensures valid use of the data. This review lists the different types of public resources with causal interactions, the different views on biological processes that they represent, the various data formats they use for data representation and storage, and the data exchange and conversion procedures that are available to extract and download these interactions. This may further raise awareness among the targeted audience, i.e. logical modelers and other scientists interested in molecular causal interactions, but also database managers and curators, about the abundance and variety of causal molecular interaction data, and the variety of tools and approaches to convert them into one interoperable resource.

Download Full-text

A Declarative Approach for Designing Web Portals

Encyclopedia of Portal Technologies and Applications ◽

10.4018/978-1-59140-989-2.ch035 ◽

2011 ◽

pp. 197-203

Author(s):

William Gardner ◽

R. Rajugan

Keyword(s):

Data Exchange ◽

Data Representation ◽

Content Management ◽

Heterogeneous Data ◽

Data Sources ◽

Distributed Model ◽

Heterogeneous Data Sources ◽

Extensible Markup ◽

Management Techniques ◽

Exchange Medium

As many enterprise and industrial content management techniques are moving towards a distributed model, the need to exchange data between heterogeneous data sources in a seamless fashion is constantly increasing. These heterogeneous data sources could arise from server groups from different manufacturers or databases at different sites with their own schemas. Since its introduction in 1996, eXtensible Markup Language (XML) (W3C-XML, 2004) has established itself as the open, presentation independent data representation and exchange medium. XML provides a mechanism for seamless data exchange in many industrial informatics settings. In addition, XML is also emerging as the dominant standard for storing, describing, representing, and interchanging data among various enterprises systems and databases in the context of complex Web enterprises information systems (EIS).

Download Full-text

Enabling Semantic Queries Across Federated Bioinformatics Databases

10.1101/686600 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ana Claudia Sima ◽

Tarcisio Mendes de Farias ◽

Erich Zbinden ◽

Maria Anisimova ◽

Manuel Gil ◽

...

Keyword(s):

Gene Expression ◽

Data Integration ◽

Heterogeneous Data ◽

Biological Data ◽

Data Sources ◽

Biological Knowledge ◽

Biological Databases ◽

Semantic Level ◽

Sparql Endpoint ◽

Link Type

MotivationData integration promises to be one of the main catalysts in enabling new insights to be drawn from the wealth of biological data available publicly. However, the heterogeneity of the different data sources, both at the syntactic and the semantic level, still poses significant challenges for achieving interoperability among biological databases.ResultsWe introduce an ontology-based federated approach for data integration. We applied this approach to three heterogeneous data stores that span different areas of biological knowledge: 1) Bgee, a gene expression relational database; 2) OMA, a Hierarchical Data Format 5 (HDF5) orthology data store, and 3) UniProtKB, a Resource Description Framework (RDF) store containing protein sequence and functional information. To enable federated queries across these sources, we first defined a new semantic model for gene expression called GenEx. We then show how the relational data in Bgee can be expressed as a virtual RDF graph, instantiating GenEx, through dedicated relational-to-RDF mappings. By applying these mappings, Bgee data are now accessible through a public SPARQL endpoint. Similarly, the materialised RDF data of OMA, expressed in terms of the Orthology ontology, is made available in a public SPARQL endpoint. We identified and formally described intersection points (i.e. virtual links) among the three data sources. These allow performing joint queries across the data stores. Finally, we lay the groundwork to enable nontechnical users to benefit from the integrated data, by providing a natural language template-based search interface.Project URLhttp://biosoda.expasy.org, https://github.com/biosoda/bioquery

Download Full-text

Understanding the Semantic Links among Heterogeneous Biological Data Sources

SSRN Electronic Journal ◽

10.2139/ssrn.883126 ◽

2006 ◽

Author(s):

Wei Wei ◽

Sudha Ram

Keyword(s):

Biological Data ◽

Data Sources

Download Full-text

Integrating Biological Data Sources and Data Analysis Tools through Mediators (available online only)

Proceedings of the 2004 ACM symposium on Applied computing - SAC '04 ◽

10.1145/967900.980091 ◽

2004 ◽

Cited By ~ 4

Author(s):

J. F. Aldana ◽

M. Roldán ◽

I. Navas ◽

A. J. Pérez ◽

O. Trelles

Keyword(s):

Data Analysis ◽

Biological Data ◽

Data Sources ◽

Analysis Tools

Download Full-text

Transformation of microbiology data into a standardised data representation using OpenEHR

Scientific Reports ◽

10.1038/s41598-021-89796-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Antje Wulff ◽

◽

Claas Baier ◽

Sarah Ballout ◽

Erik Tute ◽

...

Keyword(s):

Real Life ◽

Primary Source ◽

Multidrug Resistant ◽

Data Representation ◽

Data Models ◽

Outbreak Detection ◽

Standard Data ◽

Quality Checks ◽

Computer Based ◽

Control Application

AbstractThe spread of multidrug resistant organisms (MDRO) is a global healthcare challenge. Nosocomial outbreaks caused by MDRO are an important contributor to this threat. Computer-based applications facilitating outbreak detection can be essential to address this issue. To allow application reusability across institutions, the various heterogeneous microbiology data representations needs to be transformed into standardised, unambiguous data models. In this work, we present a multi-centric standardisation approach by using openEHR as modelling standard. Data models have been consented in a multicentre and international approach. Participating sites integrated microbiology reports from primary source systems into an openEHR-based data platform. For evaluation, we implemented a prototypical application, compared the transformed data with original reports and conducted automated data quality checks. We were able to develop standardised and interoperable microbiology data models. The publicly available data models can be used across institutions to transform real-life microbiology reports into standardised representations. The implementation of a proof-of-principle and quality control application demonstrated that the new formats as well as the integration processes are feasible. Holistic transformation of microbiological data into standardised openEHR based formats is feasible in a real-life multicentre setting and lays the foundation for developing cross-institutional, automated outbreak detection systems.

Download Full-text

A Form Verification System for the Conceptual Design of Complex Mechanical Systems

19th Design Automation Conference: Volume 1 — Mechanical System Dynamics; Concurrent and Robust Design; Design for Assembly and Manufacture; Genetic Algorithms in Design and Structural Optimization ◽

10.1115/detc1993-0297 ◽

1993 ◽

Author(s):

Jonathan S. Colton ◽

Mark P. Ouellette

Keyword(s):

Conceptual Design ◽

Process Model ◽

Mechanical Systems ◽

Data Representation ◽

Design System ◽

Complex Data ◽

Graphical Display ◽

Design Environment ◽

Blackboard System ◽

Complex Mechanical Systems

Abstract This paper presents a summary of research into the development and implementation of a domain independent, computer-based model for the conceptual design of complex mechanical systems (Ouellette, 1992). The creation of such a design model includes the integration of four major concepts: (1) The use of a graphical display for visualizing the conceptual design attributes; (2) The proper representation of the complex data and diverse knowledge required to design the system; (3) The integration of quality design methods into the conceptual design; and (4) The modeling of the conceptual design process as a mapping between functions and forms. Using the design of an automobile as a case study, a design environment was created which consisted of a distributed problem solving paradigm and a parametric graphical display. The requirements of the design problem with respect to data representation and design processing were evaluated and a process model was specified. The resulting vehicle design system consists of a tight integration between a blackboard system and a parametric design system. The completed system allows a designer to view graphical representations of the candidate conceptual designs that the blackboard system generates.

Download Full-text

A graph based model for multiple biological data sources integration

Proceedings of the 3rd International Conference on Smart City Applications - SCA '18 ◽

10.1145/3286606.3286826 ◽

2018 ◽

Cited By ~ 1

Author(s):

Hamza Hanafi ◽

Fadoua Rafii ◽

Badr Dine Rossi Hassani ◽

M'hamed Aït Kbir

Keyword(s):

Biological Data ◽

Data Sources ◽

Graph Based Model

Download Full-text

Extension of the sasCIF format and its applications for data processing and deposition

Journal of Applied Crystallography ◽

10.1107/s1600576715024942 ◽

2016 ◽

Vol 49 (1) ◽

pp. 302-310 ◽

Cited By ~ 8

Author(s):

Michael Kachala ◽

John Westbrook ◽

Dmitri Svergun

Keyword(s):

Data Analysis ◽

Data Processing ◽

Data Exchange ◽

Hybrid Methods ◽

Data Bank ◽

Relevant Information ◽

Experimental Information ◽

Biological Data ◽

Task Forces ◽

Software Modules

Recent advances in small-angle scattering (SAS) experimental facilities and data analysis methods have prompted a dramatic increase in the number of users and of projects conducted, causing an upsurge in the number of objects studied, experimental data available and structural models generated. To organize the data and models and make them accessible to the community, the Task Forces on SAS and hybrid methods for the International Union of Crystallography and the Worldwide Protein Data Bank envisage developing a federated approach to SAS data and model archiving. Within the framework of this approach, the existing databases may exchange information and provide independent but synchronized entries to users. At present, ways of exchanging information between the various SAS databases are not established, leading to possible duplication and incompatibility of entries, and limiting the opportunities for data-driven research for SAS users. In this work, a solution is developed to resolve these issues and provide a universal exchange format for the community, based on the use of the widely adopted crystallographic information framework (CIF). The previous version of the sasCIF format, implemented as an extension of the core CIF dictionary, has been available since 2000 to facilitate SAS data exchange between laboratories. The sasCIF format has now been extended to describe comprehensively the necessary experimental information, results and models, including relevant metadata for SAS data analysis and for deposition into a database. Processing tools for these files (sasCIFtools) have been developed, and these are available both as standalone open-source programs and integrated into the SAS Biological Data Bank, allowing the export and import of data entries as sasCIF files. Software modules to save the relevant information directly from beamline data-processing pipelines in sasCIF format are also developed. This update of sasCIF and the relevant tools are an important step in the standardization of the way SAS data are presented and exchanged, to make the results easily accessible to users and to promote further the application of SAS in the structural biology community.

Download Full-text