The status of causality in biological databases: data resources and data retrieval possibilities to support logical modeling

Briefings in Bioinformatics ◽

10.1093/bib/bbaa390 ◽

2020 ◽

Author(s):

Vasundra Touré ◽

Åsmund Flobak ◽

Anna Niarakis ◽

Steven Vercruysse ◽

Martin Kuiper

Keyword(s):

Molecular Interactions ◽

Regulatory Networks ◽

Data Exchange ◽

Contextual Information ◽

Data Retrieval ◽

Building Blocks ◽

Data Representation ◽

Biological Knowledge ◽

Biological Databases ◽

Knowledge Resources

Abstract Causal molecular interactions represent key building blocks used in computational modeling, where they facilitate the assembly of regulatory networks. Logical regulatory networks can be used to predict biological and cellular behaviors by system perturbations and in silico simulations. Today, broad sets of causal interactions are available in a variety of biological knowledge resources. However, different visions, based on distinct biological interests, have led to the development of multiple ways to describe and annotate causal molecular interactions. It can therefore be challenging to efficiently explore various resources of causal interaction and maintain an overview of recorded contextual information that ensures valid use of the data. This review lists the different types of public resources with causal interactions, the different views on biological processes that they represent, the various data formats they use for data representation and storage, and the data exchange and conversion procedures that are available to extract and download these interactions. This may further raise awareness among the targeted audience, i.e. logical modelers and other scientists interested in molecular causal interactions, but also database managers and curators, about the abundance and variety of causal molecular interaction data, and the variety of tools and approaches to convert them into one interoperable resource.

Download Full-text

The Status of Causality in Biological Databases for Logical Modeling: Data Resources and Data Retrieval Possibilities

10.20944/preprints202007.0123.v1 ◽

2020 ◽

Author(s):

Vasundra Touré ◽

Åsmund Flobak ◽

Anna Niarakis ◽

Steven Vercruysse ◽

Martin Kuiper

Keyword(s):

Molecular Interactions ◽

Regulatory Networks ◽

Contextual Information ◽

Data Retrieval ◽

Building Blocks ◽

Biological Knowledge ◽

Biological Databases ◽

Knowledge Resources ◽

The Status ◽

Modeling Data

Causal molecular interactions represent key building blocks used in computational modeling, where they facilitate the assembly of regulatory networks. These regulatory networks can then be used to predict biological and cellular behavior by system perturbations and in silico simulations. Today, broad sets of these interactions are being made available in a variety of biological knowledge resources. Moreover, different visions, based on distinct biological interests, have led to the development of multiple ways to describe and annotate causal molecular interactions. Therefore, data users can find it challenging to efficiently explore resources of causal interaction and to be aware of recorded contextual information that ensures valid use of the data. This manuscript presents a review of public resources collecting causal interactions and the different views they convey, together with a thorough description of the export formats established to store and retrieve these interactions. Our goal is to raise awareness amongst the targeted audience, i.e., logical modelers, but also any scientist interested in molecular causal interactions, about existing data resources and how to get familiar with them.

Download Full-text

The PlaNet Consortium: A Network of European Plant Databases Connecting Plant Genome Data in an Integrated Biological Knowledge Resource

Comparative and Functional Genomics ◽

10.1002/cfg.374 ◽

2004 ◽

Vol 5 (2) ◽

pp. 184-189 ◽

Cited By ~ 3

Author(s):

H. Schoof ◽

R. Ernst ◽

K. F. X. Mayer

Keyword(s):

Data Exchange ◽

Data Representation ◽

Data Models ◽

Biological Data ◽

Plant Genome ◽

Data Sources ◽

Direct Access ◽

Biological Knowledge ◽

Database Integration ◽

Complex Data

The completion of theArabidopsisgenome and the large collections of other plant sequences generated in recent years have sparked extensive functional genomics efforts. However, the utilization of this data is inefficient, as data sources are distributed and heterogeneous and efforts at data integration are lagging behind. PlaNet aims to overcome the limitations of individual efforts as well as the limitations of heterogeneous, independent data collections. PlaNet is a distributed effort among European bioinformatics groups and plant molecular biologists to establish a comprehensive integrated database in a collaborative network. Objectives are the implementation of infrastructure and data sources to capture plant genomic information into a comprehensive, integrated platform. This will facilitate the systematic exploration ofArabidopsisand other plants. New methods for data exchange, database integration and access are being developed to create a highly integrated, federated data resource for research. The connection between the individual resources is realized with BioMOBY. BioMOBY provides an architecture for the discovery and distribution of biological data through web services. While knowledge is centralized, data is maintained at its primary source without a need for warehousing. To standardize nomenclature and data representation, ontologies and generic data models are defined in interaction with the relevant communities.Minimal data models should make it simple to allow broad integration, while inheritance allows detail and depth to be added to more complex data objects without losing integration. To allow expert annotation and keep databases curated, local and remote annotation interfaces are provided. Easy and direct access to all data is key to the project.

Download Full-text

The Minimum Information about a Molecular Interaction Causal Statement (MI2CAST)

10.20944/preprints202004.0480.v1 ◽

2020 ◽

Author(s):

Vasundra Touré ◽

Steven Vercruysse ◽

Marcio Luis Acencio ◽

Ruth Lovering ◽

Sandra Orchard ◽

...

Keyword(s):

Molecular Interactions ◽

Molecular Interaction ◽

Regulatory Networks ◽

Building Blocks ◽

Biological Processes ◽

Causal Interaction ◽

End User ◽

Minimum Information ◽

Causal Statement ◽

In Cells

A large variety of molecular interactions occurs between biomolecular components in cells. When one or a cascade of molecular interactions results in a regulatory effect, by one component onto a downstream component, a so-called ‘causal interaction’ takes place. Causal interactions constitute the building blocks in our understanding of larger regulatory networks in cells. These causal interactions and the biological processes they enable (e.g., gene regulation) need to be described with a careful appreciation of molecular interactions that occur between entities. A proper description of this information enables archiving, sharing, and reuse by humans and for computational science. Various representations of causal relationships between biological components are currently used in a variety of resources. Here, we propose a checklist that accommodates current representations, and call it the Minimum Information about a Molecular Interaction CAusal STatement (MI2CAST). This checklist defines both the required core information, as well as a comprehensive set of other contextual details valuable to the end user and relevant for reusing and reproducing causal molecular interaction information. The MI2CAST checklist can be used as reporting guidelines when annotating and curating causal statements, while assuring uniformity and interoperability of the data across resources.

Download Full-text

National Water Data Exchange (NAWDEX) System 2000 data retrieval manual

Open-File Report ◽

10.3133/ofr81419 ◽

1981 ◽

Author(s):

Owen O. Williams ◽

William A. Knecht

Keyword(s):

Data Exchange ◽

Data Retrieval ◽

National Water

Download Full-text

SPARQL Query Generator (SQG)

Journal on Data Semantics ◽

10.1007/s13740-021-00133-y ◽

2021 ◽

Author(s):

Yanji Chen ◽

Mieczyslaw M. Kokar ◽

Jakub J. Moskal

Keyword(s):

Data Retrieval ◽

Data Representation ◽

Sparql Query ◽

Experimental Results ◽

Generation Process ◽

Manual Work ◽

Retrieval Systems ◽

Large Numbers ◽

Query Generation ◽

Good Coverage

AbstractThis paper describes a program—SPARQL Query Generator (SQG)—which takes as input an OWL ontology, a set of object descriptions in terms of this ontology and an OWL class as the context, and generates relatively large numbers of queries about various types of descriptions of objects expressed in RDF/OWL. The intent is to use SQG in evaluating data representation and retrieval systems from the perspective of OWL semantics coverage. While there are many benchmarks for assessing the efficiency of data retrieval systems, none of the existing solutions for SPARQL query generation focus on the coverage of the OWL semantics. Some are not scalable since manual work is needed for the generation process; some do not consider (or totally ignore) the OWL semantics in the ontology/instance data or rely on large numbers of real queries/datasets that are not readily available in our domain of interest. Our experimental results show that SQG performs reasonably well with generating large numbers of queries and guarantees a good coverage of OWL axioms included in the generated queries.

Download Full-text

Regulatory component analysis: A semi-blind extraction approach to infer gene regulatory networks with imperfect biological knowledge

Signal Processing ◽

10.1016/j.sigpro.2011.11.028 ◽

2012 ◽

Vol 92 (8) ◽

pp. 1902-1915 ◽

Cited By ~ 4

Author(s):

Chen Wang ◽

Jianhua Xuan ◽

Ie-Ming Shih ◽

Robert Clarke ◽

Yue Wang

Keyword(s):

Gene Regulatory Networks ◽

Regulatory Networks ◽

Component Analysis ◽

Biological Knowledge ◽

Gene Regulatory ◽

Blind Extraction

Download Full-text

Composite Materials Design Database and Data Retrieval System Requirements

Journal of Mechanical Design ◽

10.1115/1.2919411 ◽

1994 ◽

Vol 116 (2) ◽

pp. 531-538

Author(s):

W. J. Rasdorf

Keyword(s):

Composite Materials ◽

Computer Technology ◽

Retrieval System ◽

Data Retrieval ◽

Data Representation ◽

Clear Understanding ◽

Analysis And Design ◽

Problems And Solutions ◽

System Requirements ◽

Design Software

Researchers and materials engineers require a greater understanding of the problems and solutions that emerge when integrating composite materials data with computer technology so that utilitarian composite materials databases can be developed to effectively and efficiently support analysis and design software. This paper primarily serves to analyze several of the problems facing developers of composite materials databases, evolving from the complexity of the materials themselves and from the current lack of testing and data representation standards. Without a clear understanding of the scope and nature of these problems, there is no possibility of designing concise yet comprehensive composites data models, yet we feel that such an understanding is presently lacking.

Download Full-text

Consensus transcriptional regulatory networks of coronavirus-infected human cells

Scientific Data ◽

10.1038/s41597-020-00628-6 ◽

2020 ◽

Vol 7 (1) ◽

Cited By ~ 1

Author(s):

Scott A. Ochsner ◽

Rudolf T. Pillich ◽

Neil J. McKenna

Keyword(s):

Signaling Pathways ◽

Regulatory Networks ◽

Data Exchange ◽

Epithelial To Mesenchymal Transition ◽

Transcriptional Regulatory Networks ◽

Mesenchymal Transition ◽

Human Genes ◽

Transcriptional Regulatory ◽

Infected Cells ◽

Key Drivers

Abstract Establishing consensus around the transcriptional interface between coronavirus (CoV) infection and human cellular signaling pathways can catalyze the development of novel anti-CoV therapeutics. Here, we used publicly archived transcriptomic datasets to compute consensus regulatory signatures, or consensomes, that rank human genes based on their rates of differential expression in MERS-CoV (MERS), SARS-CoV-1 (SARS1) and SARS-CoV-2 (SARS2)-infected cells. Validating the CoV consensomes, we show that high confidence transcriptional targets (HCTs) of MERS, SARS1 and SARS2 infection intersect with HCTs of signaling pathway nodes with known roles in CoV infection. Among a series of novel use cases, we gather evidence for hypotheses that SARS2 infection efficiently represses E2F family HCTs encoding key drivers of DNA replication and the cell cycle; that progesterone receptor signaling antagonizes SARS2-induced inflammatory signaling in the airway epithelium; and that SARS2 HCTs are enriched for genes involved in epithelial to mesenchymal transition. The CoV infection consensomes and HCT intersection analyses are freely accessible through the Signaling Pathways Project knowledgebase, and as Cytoscape-style networks in the Network Data Exchange repository.

Download Full-text

Development of an XML-Based Specification for Traffic Model Data Exchange

Transportation Research Record Journal of the Transportation Research Board ◽

10.3141/1804-19 ◽

2002 ◽

Vol 1804 (1) ◽

pp. 144-150

Author(s):

Kenneth G. Courage ◽

Scott S. Washburn ◽

Jin-Tae Kim

Keyword(s):

Traffic Control ◽

Data Exchange ◽

Transportation Network ◽

Data Entry ◽

Data Representation ◽

Traffic Model ◽

Model Data ◽

Traffic Demand ◽

Data Formats ◽

Data Files

The proliferation of traffic software programs on the market has resulted in many very specialized programs, intended to analyze one or two specific items within a transportation network. Consequently, traffic engineers use multiple programs on a single project, which ironically has resulted in new inefficiency for the traffic engineer. Most of these programs deal with the same core set of data, for example, physical roadway characteristics, traffic demand levels, and traffic control variables. However, most of these programs have their own formats for saving data files. Therefore, these programs cannot share information directly or communicate with each other because of incompatible data formats. Thus, the traffic engineer is faced with manually reentering common data from one program into another. In addition to inefficiency, this also creates additional opportunities for data entry errors. XML is catching on rapidly as a means for exchanging data between two systems or users who deal with the same data but in different formats. Specific vocabularies have been developed for statistics, mathematics, chemistry, and many other disciplines. The traffic model markup language (TMML) is introduced as a resource for traffic model data representation, storage, rendering, and exchange. TMML structure and vocabulary are described, and examples of their use are presented.

Download Full-text

Mutations in Transcriptional Regulators Allow Selective Engineering of Signal Integration Logic

mBio ◽

10.1128/mbio.01171-14 ◽

2014 ◽

Vol 5 (3) ◽

Cited By ~ 4

Author(s):

Szabolcs Semsey

Keyword(s):

Gene Regulatory Networks ◽

Regulatory Networks ◽

Regulatory Protein ◽

Point Mutations ◽

Building Blocks ◽

Regulatory Elements ◽

Regulatory Proteins ◽

Single Amino Acid ◽

Signal Integration ◽

Gene Regulatory

ABSTRACT Bacterial cells monitor their environment by sensing a set of signals. Typically, these environmental signals affect promoter activities by altering the activity of transcription regulatory proteins. Promoters are often regulated by more than one regulatory protein, and in these cases the relevant signals are integrated by certain logic. In this work, we study how single amino acid substitutions in a regulatory protein (GalR) affect transcriptional regulation and signal integration logic at a set of engineered promoters. Our results suggest that point mutations in regulatory genes allow independent evolution of regulatory logic at different promoters. IMPORTANCE Gene regulatory networks are built from simple building blocks, such as promoters, transcription regulatory proteins, and their binding sites on DNA. Many promoters are regulated by more than one regulatory input. In these cases, the inputs are integrated and allow transcription only in certain combinations of input signals. Gene regulatory networks can be easily rewired, because the function of cis-regulatory elements and promoters can be altered by point mutations. In this work, we tested how point mutations in transcription regulatory proteins can affect signal integration logic. We found that such mutations allow context-dependent engineering of signal integration logic at promoters, further contributing to the plasticity of gene regulatory networks.

Download Full-text