scholarly journals The PlaNet Consortium: A Network of European Plant Databases Connecting Plant Genome Data in an Integrated Biological Knowledge Resource

2004 ◽  
Vol 5 (2) ◽  
pp. 184-189 ◽  
Author(s):  
H. Schoof ◽  
R. Ernst ◽  
K. F. X. Mayer

The completion of theArabidopsisgenome and the large collections of other plant sequences generated in recent years have sparked extensive functional genomics efforts. However, the utilization of this data is inefficient, as data sources are distributed and heterogeneous and efforts at data integration are lagging behind. PlaNet aims to overcome the limitations of individual efforts as well as the limitations of heterogeneous, independent data collections. PlaNet is a distributed effort among European bioinformatics groups and plant molecular biologists to establish a comprehensive integrated database in a collaborative network. Objectives are the implementation of infrastructure and data sources to capture plant genomic information into a comprehensive, integrated platform. This will facilitate the systematic exploration ofArabidopsisand other plants. New methods for data exchange, database integration and access are being developed to create a highly integrated, federated data resource for research. The connection between the individual resources is realized with BioMOBY. BioMOBY provides an architecture for the discovery and distribution of biological data through web services. While knowledge is centralized, data is maintained at its primary source without a need for warehousing. To standardize nomenclature and data representation, ontologies and generic data models are defined in interaction with the relevant communities.Minimal data models should make it simple to allow broad integration, while inheritance allows detail and depth to be added to more complex data objects without losing integration. To allow expert annotation and keep databases curated, local and remote annotation interfaces are provided. Easy and direct access to all data is key to the project.

Database ◽  
2019 ◽  
Vol 2019 ◽  
Author(s):  
Ana Claudia Sima ◽  
Tarcisio Mendes de Farias ◽  
Erich Zbinden ◽  
Maria Anisimova ◽  
Manuel Gil ◽  
...  

Abstract Motivation: Data integration promises to be one of the main catalysts in enabling new insights to be drawn from the wealth of biological data available publicly. However, the heterogeneity of the different data sources, both at the syntactic and the semantic level, still poses significant challenges for achieving interoperability among biological databases. Results: We introduce an ontology-based federated approach for data integration. We applied this approach to three heterogeneous data stores that span different areas of biological knowledge: (i) Bgee, a gene expression relational database; (ii) Orthologous Matrix (OMA), a Hierarchical Data Format 5 orthology DS; and (iii) UniProtKB, a Resource Description Framework (RDF) store containing protein sequence and functional information. To enable federated queries across these sources, we first defined a new semantic model for gene expression called GenEx. We then show how the relational data in Bgee can be expressed as a virtual RDF graph, instantiating GenEx, through dedicated relational-to-RDF mappings. By applying these mappings, Bgee data are now accessible through a public SPARQL endpoint. Similarly, the materialized RDF data of OMA, expressed in terms of the Orthology ontology, is made available in a public SPARQL endpoint. We identified and formally described intersection points (i.e. virtual links) among the three data sources. These allow performing joint queries across the data stores. Finally, we lay the groundwork to enable nontechnical users to benefit from the integrated data, by providing a natural language template-based search interface.


Author(s):  
Vasundra Touré ◽  
Åsmund Flobak ◽  
Anna Niarakis ◽  
Steven Vercruysse ◽  
Martin Kuiper

Abstract Causal molecular interactions represent key building blocks used in computational modeling, where they facilitate the assembly of regulatory networks. Logical regulatory networks can be used to predict biological and cellular behaviors by system perturbations and in silico simulations. Today, broad sets of causal interactions are available in a variety of biological knowledge resources. However, different visions, based on distinct biological interests, have led to the development of multiple ways to describe and annotate causal molecular interactions. It can therefore be challenging to efficiently explore various resources of causal interaction and maintain an overview of recorded contextual information that ensures valid use of the data. This review lists the different types of public resources with causal interactions, the different views on biological processes that they represent, the various data formats they use for data representation and storage, and the data exchange and conversion procedures that are available to extract and download these interactions. This may further raise awareness among the targeted audience, i.e. logical modelers and other scientists interested in molecular causal interactions, but also database managers and curators, about the abundance and variety of causal molecular interaction data, and the variety of tools and approaches to convert them into one interoperable resource.


Author(s):  
William Gardner ◽  
R. Rajugan

As many enterprise and industrial content management techniques are moving towards a distributed model, the need to exchange data between heterogeneous data sources in a seamless fashion is constantly increasing. These heterogeneous data sources could arise from server groups from different manufacturers or databases at different sites with their own schemas. Since its introduction in 1996, eXtensible Markup Language (XML) (W3C-XML, 2004) has established itself as the open, presentation independent data representation and exchange medium. XML provides a mechanism for seamless data exchange in many industrial informatics settings. In addition, XML is also emerging as the dominant standard for storing, describing, representing, and interchanging data among various enterprises systems and databases in the context of complex Web enterprises information systems (EIS).


2019 ◽  
Author(s):  
Ana Claudia Sima ◽  
Tarcisio Mendes de Farias ◽  
Erich Zbinden ◽  
Maria Anisimova ◽  
Manuel Gil ◽  
...  

MotivationData integration promises to be one of the main catalysts in enabling new insights to be drawn from the wealth of biological data available publicly. However, the heterogeneity of the different data sources, both at the syntactic and the semantic level, still poses significant challenges for achieving interoperability among biological databases.ResultsWe introduce an ontology-based federated approach for data integration. We applied this approach to three heterogeneous data stores that span different areas of biological knowledge: 1) Bgee, a gene expression relational database; 2) OMA, a Hierarchical Data Format 5 (HDF5) orthology data store, and 3) UniProtKB, a Resource Description Framework (RDF) store containing protein sequence and functional information. To enable federated queries across these sources, we first defined a new semantic model for gene expression called GenEx. We then show how the relational data in Bgee can be expressed as a virtual RDF graph, instantiating GenEx, through dedicated relational-to-RDF mappings. By applying these mappings, Bgee data are now accessible through a public SPARQL endpoint. Similarly, the materialised RDF data of OMA, expressed in terms of the Orthology ontology, is made available in a public SPARQL endpoint. We identified and formally described intersection points (i.e. virtual links) among the three data sources. These allow performing joint queries across the data stores. Finally, we lay the groundwork to enable nontechnical users to benefit from the integrated data, by providing a natural language template-based search interface.Project URLhttp://biosoda.expasy.org, https://github.com/biosoda/bioquery


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Antje Wulff ◽  
◽  
Claas Baier ◽  
Sarah Ballout ◽  
Erik Tute ◽  
...  

AbstractThe spread of multidrug resistant organisms (MDRO) is a global healthcare challenge. Nosocomial outbreaks caused by MDRO are an important contributor to this threat. Computer-based applications facilitating outbreak detection can be essential to address this issue. To allow application reusability across institutions, the various heterogeneous microbiology data representations needs to be transformed into standardised, unambiguous data models. In this work, we present a multi-centric standardisation approach by using openEHR as modelling standard. Data models have been consented in a multicentre and international approach. Participating sites integrated microbiology reports from primary source systems into an openEHR-based data platform. For evaluation, we implemented a prototypical application, compared the transformed data with original reports and conducted automated data quality checks. We were able to develop standardised and interoperable microbiology data models. The publicly available data models can be used across institutions to transform real-life microbiology reports into standardised representations. The implementation of a proof-of-principle and quality control application demonstrated that the new formats as well as the integration processes are feasible. Holistic transformation of microbiological data into standardised openEHR based formats is feasible in a real-life multicentre setting and lays the foundation for developing cross-institutional, automated outbreak detection systems.


Author(s):  
Jonathan S. Colton ◽  
Mark P. Ouellette

Abstract This paper presents a summary of research into the development and implementation of a domain independent, computer-based model for the conceptual design of complex mechanical systems (Ouellette, 1992). The creation of such a design model includes the integration of four major concepts: (1) The use of a graphical display for visualizing the conceptual design attributes; (2) The proper representation of the complex data and diverse knowledge required to design the system; (3) The integration of quality design methods into the conceptual design; and (4) The modeling of the conceptual design process as a mapping between functions and forms. Using the design of an automobile as a case study, a design environment was created which consisted of a distributed problem solving paradigm and a parametric graphical display. The requirements of the design problem with respect to data representation and design processing were evaluated and a process model was specified. The resulting vehicle design system consists of a tight integration between a blackboard system and a parametric design system. The completed system allows a designer to view graphical representations of the candidate conceptual designs that the blackboard system generates.


2016 ◽  
Vol 49 (1) ◽  
pp. 302-310 ◽  
Author(s):  
Michael Kachala ◽  
John Westbrook ◽  
Dmitri Svergun

Recent advances in small-angle scattering (SAS) experimental facilities and data analysis methods have prompted a dramatic increase in the number of users and of projects conducted, causing an upsurge in the number of objects studied, experimental data available and structural models generated. To organize the data and models and make them accessible to the community, the Task Forces on SAS and hybrid methods for the International Union of Crystallography and the Worldwide Protein Data Bank envisage developing a federated approach to SAS data and model archiving. Within the framework of this approach, the existing databases may exchange information and provide independent but synchronized entries to users. At present, ways of exchanging information between the various SAS databases are not established, leading to possible duplication and incompatibility of entries, and limiting the opportunities for data-driven research for SAS users. In this work, a solution is developed to resolve these issues and provide a universal exchange format for the community, based on the use of the widely adopted crystallographic information framework (CIF). The previous version of the sasCIF format, implemented as an extension of the core CIF dictionary, has been available since 2000 to facilitate SAS data exchange between laboratories. The sasCIF format has now been extended to describe comprehensively the necessary experimental information, results and models, including relevant metadata for SAS data analysis and for deposition into a database. Processing tools for these files (sasCIFtools) have been developed, and these are available both as standalone open-source programs and integrated into the SAS Biological Data Bank, allowing the export and import of data entries as sasCIF files. Software modules to save the relevant information directly from beamline data-processing pipelines in sasCIF format are also developed. This update of sasCIF and the relevant tools are an important step in the standardization of the way SAS data are presented and exchanged, to make the results easily accessible to users and to promote further the application of SAS in the structural biology community.


Sign in / Sign up

Export Citation Format

Share Document