‘One DB to rule them all’—the RING: a Regulatory INteraction Graph combining TFs, genes/proteins, SNPs, diseases and drugs

Database ◽

10.1093/database/baz108 ◽

2019 ◽

Vol 2019 ◽

Cited By ~ 1

Author(s):

Gianfranco Politano ◽

Stefano Di Carlo ◽

Alfredo Benso

Keyword(s):

Regulatory Networks ◽

Heterogeneous Data ◽

Biological Interactions ◽

Interaction Graph ◽

Sources Of Information ◽

Regulatory Interaction ◽

Data Repositories ◽

Regulatory Cascade ◽

Sketch Study ◽

High Level

Abstract In the last decade, genomics data have been largely adopted to sketch, study and better understand the complex mechanisms that underlie biological processes. The amount of publicly available data sources has grown accordingly, and several types of regulatory interactions have been collected and documented in literature. Unfortunately, often these efforts do not follow any data naming/interoperability/formatting standards, resulting in high-quality but often uninteroperable heterogeneous data repositories. To efficiently take advantage of the large amount of available data and integrate these heterogeneous sources of information, we built the RING (Regulatory Interaction Graph), an integrative standardized multilevel database of biological interactions able to provide a comprehensive and unmatched high-level perspective on several phenomena that take place in the regulatory cascade and that researchers can use to easily build regulatory networks around entities of interest.

Download Full-text

metaXplor: an interactive viral and microbial metagenomic data manager

GigaScience ◽

10.1093/gigascience/giab001 ◽

2021 ◽

Vol 10 (2) ◽

Author(s):

Guilhem Sempéré ◽

Adrien Pétel ◽

Magsen Abbé ◽

Pierre Lefeuvre ◽

Philippe Roumagnac ◽

...

Keyword(s):

Heterogeneous Data ◽

Metagenomic Data ◽

Online Data ◽

Data Repositories ◽

Ongoing Research ◽

Efficient Management ◽

Public Data ◽

Reference Databases ◽

Interactive Data ◽

User Friendly

Abstract Background Efficiently managing large, heterogeneous data in a structured yet flexible way is a challenge to research laboratories working with genomic data. Specifically regarding both shotgun- and metabarcoding-based metagenomics, while online reference databases and user-friendly tools exist for running various types of analyses (e.g., Qiime, Mothur, Megan, IMG/VR, Anvi'o, Qiita, MetaVir), scientists lack comprehensive software for easily building scalable, searchable, online data repositories on which they can rely during their ongoing research. Results metaXplor is a scalable, distributable, fully web-interfaced application for managing, sharing, and exploring metagenomic data. Being based on a flexible NoSQL data model, it has few constraints regarding dataset contents and thus proves useful for handling outputs from both shotgun and metabarcoding techniques. By supporting incremental data feeding and providing means to combine filters on all imported fields, it allows for exhaustive content browsing, as well as rapid narrowing to find specific records. The application also features various interactive data visualization tools, ways to query contents by BLASTing external sequences, and an integrated pipeline to enrich assignments with phylogenetic placements. The project home page provides the URL of a live instance allowing users to test the system on public data. Conclusion metaXplor allows efficient management and exploration of metagenomic data. Its availability as a set of Docker containers, making it easy to deploy on academic servers, on the cloud, or even on personal computers, will facilitate its adoption.

Download Full-text

KBoost: a new method to infer gene regulatory networks from gene expression data

Scientific Reports ◽

10.1038/s41598-021-94919-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Luis F. Iglesias-Martinez ◽

Barbara De Kegel ◽

Walter Kolch

Keyword(s):

Breast Cancer ◽

Gene Regulatory Networks ◽

Regulatory Networks ◽

Bayesian Model Averaging ◽

Model Averaging ◽

R Package ◽

Breast Cancer Patients ◽

Sources Of Information ◽

Cancer Subtypes ◽

Gene Regulatory

AbstractReconstructing gene regulatory networks is crucial to understand biological processes and holds potential for developing personalized treatment. Yet, it is still an open problem as state-of-the-art algorithms are often not able to process large amounts of data within reasonable time. Furthermore, many of the existing methods predict numerous false positives and have limited capabilities to integrate other sources of information, such as previously known interactions. Here we introduce KBoost, an algorithm that uses kernel PCA regression, boosting and Bayesian model averaging for fast and accurate reconstruction of gene regulatory networks. We have benchmarked KBoost against other high performing algorithms using three different datasets. The results show that our method compares favorably to other methods across datasets. We have also applied KBoost to a large cohort of close to 2000 breast cancer patients and 24,000 genes in less than 2 h on standard hardware. Our results show that molecularly defined breast cancer subtypes also feature differences in their GRNs. An implementation of KBoost in the form of an R package is available at: https://github.com/Luisiglm/KBoost and as a Bioconductor software package.

Download Full-text

The Impact of the Privacy Rule on Cancer Research: Variations in Attitudes and Application of Regulatory Standards

Journal of Clinical Oncology ◽

10.1200/jco.2009.22.3289 ◽

2009 ◽

Vol 27 (24) ◽

pp. 4014-4020 ◽

Cited By ~ 6

Author(s):

Elizabeth Goss ◽

Michael P. Link ◽

Suanna S. Bruinooge ◽

Theodore S. Lawrence ◽

Joel E. Tepper ◽

...

Keyword(s):

Cancer Research ◽

Future Research ◽

Research Committee ◽

Data Repositories ◽

Privacy Rule ◽

Regulatory Standards ◽

American Society ◽

Definition Of ◽

High Level ◽

The Impact

Purpose The American Society of Clinical Oncology (ASCO) Cancer Research Committee designed a qualitative research project to assess the attitudes of cancer researchers and compliance officials regarding compliance with the US Privacy Rule and to identify potential strategies for eliminating perceived or real barriers to achieving compliance. Methods A team of three interviewers asked 27 individuals (13 investigators and 14 compliance officials) from 13 institutions to describe the anticipated approach of their institutions to Privacy Rule compliance in three hypothetical research studies. Results The interviews revealed that although researchers and compliance officials share the view that patients' cancer diagnoses should enjoy a high level of privacy protection, there are significant tensions between the two groups related to the proper standards for compliance necessary to protect patients. The disagreements are seen most clearly with regard to the appropriate definition of a “future research use” of protected health information in biospecimen and data repositories and the standards for a waiver of authorization for disclosure and use of such data. Conclusion ASCO believes that disagreements related to compliance and the resulting delays in certain projects and abandonment of others might be eased by additional institutional training programs and consultation on Privacy Rule issues during study design. ASCO also proposes the development of best practices documents to guide 1) creation of data repositories, 2) disclosure and use of data from such repositories, and 3) the design of survivorship and genetics studies.

Download Full-text

A Fast and Effective Method to Identify Relevant Sets of Variables in Complex Systems

Mathematics ◽

10.3390/math9091022 ◽

2021 ◽

Vol 9 (9) ◽

pp. 1022

Author(s):

Gianluca D’Addese ◽

Martina Casari ◽

Roberto Serra ◽

Marco Villani

Keyword(s):

Complex Systems ◽

Gene Regulatory Networks ◽

Regulatory Networks ◽

Computational Cost ◽

Graph Analysis ◽

The Past ◽

Medium Level ◽

Micro Level ◽

Gene Regulatory ◽

High Level

In many complex systems one observes the formation of medium-level structures, whose detection could allow a high-level description of the dynamical organization of the system itself, and thus to its better understanding. We have developed in the past a powerful method to achieve this goal, which however requires a heavy computational cost in several real-world cases. In this work we introduce a modified version of our approach, which reduces the computational burden. The design of the new algorithm allowed the realization of an original suite of methods able to work simultaneously at the micro level (that of the binary relationships of the single variables) and at meso level (the identification of dynamically relevant groups). We apply this suite to a particularly relevant case, in which we look for the dynamic organization of a gene regulatory network when it is subject to knock-outs. The approach combines information theory, graph analysis, and an iterated sieving algorithm in order to describe rather complex situations. Its application allowed to derive some general observations on the dynamical organization of gene regulatory networks, and to observe interesting characteristics in an experimental case.

Download Full-text

Entity Type Recognition for Heterogeneous Semantic Graphs

AI Magazine ◽

10.1609/aimag.v36i1.2569 ◽

2015 ◽

Vol 36 (1) ◽

pp. 75-86 ◽

Cited By ~ 4

Author(s):

Jennifer Sleeman ◽

Tim Finin ◽

Anupam Joshi

Keyword(s):

Machine Learning ◽

Background Knowledge ◽

Knowledge Bases ◽

Heterogeneous Data ◽

Unstructured Data ◽

Supervised Machine Learning ◽

Coreference Resolution ◽

Multiple Sources ◽

Fine Grained ◽

High Level

We describe an approach for identifying fine-grained entity types in heterogeneous data graphs that is effective for unstructured data or when the underlying ontologies or semantic schemas are unknown. Identifying fine-grained entity types, rather than a few high-level types, supports coreference resolution in heterogeneous graphs by reducing the number of possible coreference relations that must be considered. Big data problems that involve integrating data from multiple sources can benefit from our approach when the datas ontologies are unknown, inaccessible or semantically trivial. For such cases, we use supervised machine learning to map entity attributes and relations to a known set of attributes and relations from appropriate background knowledge bases to predict instance entity types. We evaluated this approach in experiments on data from DBpedia, Freebase, and Arnetminer using DBpedia as the background knowledge base.

Download Full-text

AN ARCHITECTURE FOR DATA WAREHOUSING SUPPORTING DATA INDEPENDENCE AND INTEROPERABILITY

International Journal of Cooperative Information Systems ◽

10.1142/s0218843001000394 ◽

2001 ◽

Vol 10 (03) ◽

pp. 377-397 ◽

Cited By ~ 8

Author(s):

LUCA CABIBBO ◽

RICCARDO TORLONE

Keyword(s):

Data Warehouse ◽

Data Model ◽

Data Warehousing ◽

Heterogeneous Data ◽

Multidimensional Data ◽

Multidimensional Databases ◽

Level Of Aggregation ◽

High Level ◽

Data Independence ◽

Logical Architecture

We report on the design of a novel architecture for data warehousing based on the introduction of an explicit "logical" layer to the traditional data warehousing framework. This layer serves to guarantee a complete independence of OLAP applications from the physical storage structure of the data warehouse and thus allows users and applications to manipulate multidimensional data ignoring implementation details. For example, it makes possible the modification of the data warehouse organization (e.g. MOLAP or ROLAP implementation, star scheme or snowflake scheme structure) without influencing the high level description of multidimensional data and programs that use the data. Also, it supports the integration of multidimensional data stored in heterogeneous OLAP servers. We propose [Formula: see text], a simple data model for multidimensional databases, as the reference for the logical layer. [Formula: see text] provides an abstract formalism to describe the basic concepts that can be found in any OLAP system (fact, dimension, level of aggregation, and measure). We show that [Formula: see text] databases can be implemented in both relational and multidimensional storage systems. We also show that [Formula: see text] can be profitably used in OLAP applications as front-end. We finally describe the design of a practical system that supports the above logical architecture; this system is used to show in practice how the architecture we propose can hide implementation details and provides a support for interoperability between different and possibly heterogeneous data warehouse applications.

Download Full-text

An Open-Source Cloud-FPGA Gene Regulatory Accelerator

10.5753/wscad.2021.18527 ◽

2021 ◽

Author(s):

Lucas Bragança ◽

Jeronimo Penha ◽

Michael Canesche ◽

Dener Ribeiro ◽

José Augusto M. Nacif ◽

...

Keyword(s):

Open Source ◽

Gene Regulatory Networks ◽

High Performance ◽

Regulatory Networks ◽

Cloud Services ◽

On Demand ◽

Speed Up ◽

Gene Regulatory ◽

High Level ◽

Better Than

FPGAs are suitable to speed up gene regulatory network (GRN) algorithms with high throughput and energy efficiency. In addition, virtualizing FPGA using hardware generators and cloud resources increases the computing ability to achieve on-demand accelerations across multiple users. Recently, Amazon AWS provides high-performance Cloud's FPGAs. This work proposes an open source accelerator generator for Boolean gene regulatory networks. The generator automatically creates all hardware and software pieces from a high-level GRN description. We evaluate the accelerator performance and cost for CPU, GPU, and Cloud FPGA implementations by considering six GRN models proposed in the literature. As a result, the FPGA accelerator is at least 12x faster than the best GPU accelerator. Furthermore, the FPGA reaches the best performance per dollar in cloud services, at least 5x better than the best GPU accelerator.

Download Full-text

Discriminating the Single-cell Gene Regulatory Networks of Human Pancreatic Islets: A Novel Deep Learning Application

10.1101/2020.08.30.273839 ◽

2020 ◽

Author(s):

Turki Turki ◽

Y-h. Taguchi

Keyword(s):

Deep Learning ◽

Single Cell ◽

Gene Regulatory Networks ◽

Regulatory Networks ◽

Metabolic Diseases ◽

Large Data ◽

Data Repositories ◽

Cell Gene Expression ◽

Gene Regulatory ◽

Cell Gene

AbstractAnalyzing single-cell pancreatic data would play an important role in understanding various metabolic diseases and health conditions. Due to the sparsity and noise present in such single-cell gene expression data, analyzing various functions related to the inference of gene regulatory networks, derived from single-cell data, remains difficult, thereby posing a barrier to the deepening of understanding of cellular metabolism. Since recent studies have led to the reliable inference of single-cell gene regulatory networks (SCGRNs), the challenge of discriminating between SCGRNs has now arisen. By accurately discriminating between SCGRNs (e.g., distinguishing SCGRNs of healthy pancreas from those of T2D pancreas), biologists would be able to annotate, organize, visualize, and identify common patterns of SCGRNs for metabolic diseases. Such annotated SCGRNs could play an important role in speeding up the process of building large data repositories. In this study, we aimed to contribute to the development of a novel deep learning (DL) application. First, we generated a dataset consisting of 224 SCGRNs belonging to both T2D and healthy pancreas and made it freely available. Next, we chose seven DL architectures, including VGG16, VGG19, Xception, ResNet50, ResNet101, DenseNet121, and DenseNet169, trained each of them on the dataset, and checked prediction based on a test set. We evaluated the DL architectures on an HP workstation platform with a single NVIDIA GeForce RTX 2080Ti GPU. Experimental results on the whole dataset, using several performance measures, demonstrated the superiority of VGG19 DL model in the automatic classification of SCGRNs, derived from the single-cell pancreatic data.

Download Full-text

Infants in Control: Prospective Motor Control and Executive Functions in Action Development

10.31237/osf.io/w8suh ◽

2018 ◽

Author(s):

Janna M. Gottwald

Keyword(s):

Motor Control ◽

Cognitive Control ◽

Executive Functions ◽

Visual Information ◽

Motion Tracking ◽

Planning Process ◽

Action Planning ◽

Sources Of Information ◽

Action Sequences ◽

High Level

This thesis assesses the link between action and cognition early in development. Thus the notion of an embodied cognition is investigated by tying together two levels of action control in the context of reaching in infancy: prospective motor control and executive functions. The ability to plan our actions is the inevitable foundation of reaching our goals. Thus actions can be stratified on different levels of control. There is the relatively low level of prospective motor control and the comparatively high level of cognitive control. Prospective motor control is concerned with goal-directed actions on the level of single movements and movement combinations of our body and ensures purposeful, coordinated movements, such as reaching for a cup of coffee. Cognitive control, in the context of this thesis more precisely referred to as executive functions, deals with goal-directed actions on the level of whole actions and action combinations and facilitates directedness towards mid- and long-term goals, such as finishing a doctoral thesis. Whereas prospective motor control and executive functions are well studied in adulthood, the early development of both is not sufficiently understood.This thesis comprises three empirical motion-tracking studies that shed light on prospective motor control and executive functions in infancy. Study I investigated the prospective motor control of current actions by having 14-month-olds lift objects of varying weights. In doing so, multi-cue integration was addressed by comparing the use of visual and non-visual information to non-visual information only. Study II examined the prospective motor control of future actions in action sequences by investigating reach-to-place actions in 14-month-olds. Thus the extent to which Fitts’ law can explain movement duration in infancy was addressed. Study III lifted prospective motor control to a higher that is cognitive level, by investigating it relative to executive functions in 18-months-olds.Main results were that 14-month-olds are able to prospectively control their manual actions based on object weight. In this action planning process, infants use different sources of information. Beyond this ability to prospectively control their current action, 14-month-olds also take future actions into account and plan their actions based on the difficulty of the subsequentaction in action sequences. In 18-month-olds, prospective motor control in manual actions, such as reaching, is related to early executive functions, as demonstrated for behavioral prohibition and working memory. These findings are consistent with the idea that executive functions derive from prospective motor control. I suggest that executive functions could be grounded in the development of motor control. In other words, early executive functions should be seen as embodied.

Download Full-text

Inference of Gene Regulatory Networks by Topological Prior Information and Data Integration

Biotechnology ◽

10.4018/978-1-5225-8903-7.ch010 ◽

2019 ◽

pp. 265-304

Author(s):

David Correa Martins Jr. ◽

Fabricio Martins Lopes ◽

Shubhra Sankar Ray

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Regulatory Networks ◽

Regulatory Networks ◽

Prior Information ◽

Heterogeneous Data ◽

Data Sources ◽

Expression Data ◽

Heterogeneous Data Sources ◽

Gene Regulatory

The inference of Gene Regulatory Networks (GRNs) is a very challenging problem which has attracted increasing attention since the development of high-throughput sequencing and gene expression measurement technologies. Many models and algorithms have been developed to identify GRNs using mainly gene expression profile as data source. As the gene expression data usually has limited number of samples and inherent noise, the integration of gene expression with several other sources of information can be vital for accurately inferring GRNs. For instance, some prior information about the overall topological structure of the GRN can guide inference techniques toward better results. In addition to gene expression data, recently biological information from heterogeneous data sources have been integrated by GRN inference methods as well. The objective of this chapter is to present an overview of GRN inference models and techniques with focus on incorporation of prior information such as, global and local topological features and integration of several heterogeneous data sources.

Download Full-text