Controlling false discoveries in Bayesian gene networks with lasso regression p-values

Mapping Intimacies ◽

10.1101/288217 ◽

2018 ◽

Cited By ~ 1

Author(s):

Lingfei Wang ◽

Tom Michoel

Keyword(s):

Bayesian Networks ◽

Gene Networks ◽

Network Inference ◽

Supplementary Information ◽

Lasso Regression ◽

Link Type ◽

False Discovery ◽

Systematic Biases ◽

Empirical Tests ◽

False Discoveries

AbstractMotivationBayesian networks can represent directed gene regulations and therefore are favored over co-expression networks. However, hardly any Bayesian network study concerns the false discovery control (FDC) of network edges, leading to low accuracies due to systematic biases from inconsistent false discovery levels in the same study.ResultsWe design four empirical tests to examine the FDC of Bayesian networks from three p-value based lasso regression variable selections — two existing and one we originate. Our method, lassopv, computes p-values for the critical regularization strength at which a predictor starts to contribute to lasso regression. Using null and Geuvadis datasets, we find that lassopv obtains optimal FDC in Bayesian gene networks, whilst existing methods have defective p-values. The FDC concept and tests extend to most network inference scenarios and will guide the design and improvement of new and existing methods. Our novel variable selection method with lasso regression also allows FDC on other datasets and questions, even beyond network inference and computational biology.AvailabilityLassopv is implemented in R and freely available at https://github.com/lingfeiwang/lassopv and https://cran.r-project.org/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

corto: a lightweight R package for Gene Network Inference and Master Regulator Analysis

10.1101/2020.02.10.942623 ◽

2020 ◽

Cited By ~ 1

Author(s):

Daniele Mercatelli ◽

Gonzalo Lopez-Garcia ◽

Federico M. Giorgi

Keyword(s):

Gene Expression ◽

Gene Networks ◽

Gene Network ◽

Network Inference ◽

Human Tumor ◽

R Package ◽

Specific Gene ◽

Master Regulator ◽

Gene Network Inference ◽

Link Type

AbstractMotivationGene Network Inference and Master Regulator Analysis (MRA) have been widely adopted to define specific transcriptional perturbations from gene expression signatures. Several tools exist to perform such analyses, but most require a computer cluster or large amounts of RAM to be executed.ResultsWe developed corto, a fast and lightweight R package to infer gene networks and perform MRA from gene expression data, with optional corrections for Copy Number Variations (CNVs) and able to run on signatures generated from RNA-Seq or ATAC-Seq data. We extensively benchmarked it to infer context-specific gene networks in 39 human tumor and 27 normal tissue datasets.AvailabilityCross-platform and multi-threaded R package on CRAN (stable version) https://cran.rproject.org/package=corto and Github (development release) https://github.com/federicogiorgi/[email protected]

Download Full-text

shinyBN: an online application for interactive Bayesian network inference and visualization

BMC Bioinformatics ◽

10.1186/s12859-019-3309-0 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 1

Author(s):

Jiajin Chen ◽

Ruyang Zhang ◽

Xuesi Dong ◽

Lijuan Lin ◽

Ying Zhu ◽

...

Keyword(s):

Bayesian Networks ◽

Bayesian Network ◽

Network Inference ◽

Graphical Model ◽

Disease Risk ◽

External Validation ◽

Easy Method ◽

Online Application ◽

Link Type ◽

Research Fields

Abstract Background High-throughput technologies have brought tremendous changes to biological domains, and the resulting high-dimensional data has also posed enormous challenges to computational science. A Bayesian network is a probabilistic graphical model represented by a directed acyclic graph, which provides concise semantics to describe the relationship between entities and has an independence assumption that is suitable for sparse omics data. Bayesian networks have been broadly used in biomedical research fields, including disease risk assessment and prognostic prediction. However, the inference and visualization of Bayesian networks are unfriendly to the users lacking programming skills. Results We developed an R/Shiny application, shinyBN, which is an online graphical user interface to facilitate the inference and visualization of Bayesian networks. shinyBN supports multiple types of input and provides flexible settings for network rendering and inference. For output, users can download network plots, prediction results and external validation results in publication-ready high-resolution figures. Conclusion Our user-friendly application (shinyBN) provides users with an easy method for Bayesian network modeling, inference and visualization via mouse clicks. shinyBN can be used in the R environment or online and is compatible with three major operating systems, including Windows, Linux and Mac OS. shinyBN is deployed at https://jiajin.shinyapps.io/shinyBN/. Source codes and the manual are freely available at https://github.com/JiajinChen/shinyBN.

Download Full-text

Phoenix Enhancer: proteomics data mining using clustered spectra

10.1101/846303 ◽

2019 ◽

Author(s):

Mingze Bai ◽

Chunyuan Qin ◽

Kunxian Shu ◽

Johannes Griss ◽

Yasset Perez-Riverol ◽

...

Keyword(s):

Supplementary Information ◽

Proteomics Data ◽

Web Based ◽

Tandem Mass Spectra ◽

Link Type ◽

False Discovery ◽

Cluster Data ◽

Public Data ◽

Visualization Tools ◽

Individual Dataset

AbstractMotivationSpectrum clustering has been used to enhance proteomics data analysis: some originally unidentified spectra can potentially be identified and individual peptides can be evaluated to find potential mis-identifications by using clusters of identified spectra. The Phoenix Enhancer provides an infrastructure to analyze tandem mass spectra and the corresponding peptides in the context of previously identified public data. Based on PRIDE Cluster data and a newly developed pipeline, four functionalities are provided: i) evaluate the original peptide identifications in an individual dataset, to find low confidence peptide spectrum matches (PSMs) which could correspond to mis-identifications; ii) provide confidence scores for all originally identified PSMs, to help users evaluate their quality (complementary to getting a global false discovery rate); iii) identify potential new PSMs for originally unidentified spectra; and iv) provide a collection of browsing and visualization tools to analyze and export the results. In addition to the web based service, the code is open-source and easy to re-deploy on local computers using Docker containers.AvailabilityThe service of Phoenix Enhancer is available at http://enhancer.ncpsb.org. All source code is freely available in GitHub (https://github.com/phoenix-cluster/) and can be deployed in the Cloud and HPC [email protected] informationSupplementary data are available online.

Download Full-text

corto: a lightweight R package for gene network inference and master regulator analysis

Bioinformatics ◽

10.1093/bioinformatics/btaa223 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3916-3917 ◽

Cited By ~ 6

Author(s):

Daniele Mercatelli ◽

Gonzalo Lopez-Garcia ◽

Federico M Giorgi

Keyword(s):

Gene Expression ◽

Gene Networks ◽

Gene Network ◽

Network Inference ◽

Human Tumor ◽

R Package ◽

Supplementary Information ◽

Specific Gene ◽

Master Regulator ◽

Gene Network Inference

Abstract Motivation Gene network inference and master regulator analysis (MRA) have been widely adopted to define specific transcriptional perturbations from gene expression signatures. Several tools exist to perform such analyses but most require a computer cluster or large amounts of RAM to be executed. Results We developed corto, a fast and lightweight R package to infer gene networks and perform MRA from gene expression data, with optional corrections for copy-number variations and able to run on signatures generated from RNA-Seq or ATAC-Seq data. We extensively benchmarked it to infer context-specific gene networks in 39 human tumor and 27 normal tissue datasets. Availability and implementation Cross-platform and multi-threaded R package on CRAN (stable version) https://cran.r-project.org/package=corto and Github (development release) https://github.com/federicogiorgi/corto. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Proxi: a Python package for proximity network inference from metagenomic data

10.1101/357764 ◽

2018 ◽

Cited By ~ 1

Author(s):

Yasser EL-Manzalawy

Keyword(s):

Network Inference ◽

Nearest Neighbor ◽

Microbial Interactions ◽

Supplementary Information ◽

Metagenomic Data ◽

Metagenomic Sequencing ◽

Proximity Graphs ◽

Link Type ◽

Technological Advances ◽

Python Package

AbstractSummary: Recent technological advances in high-throughput metagenomic sequencing have provided unique opportunities for studying the diversity and dynamics of microbial communities under different health or environmental conditions. Graph-based representation of metagenomic data is a promising direction not only for analyzing microbial interactions but also for a broad range of machine learning tasks including feature selection, classification, clustering, anomaly detection, and dimensionality reduction. We present Proxi, an open source Python package for learning different types of proximity graphs from metagenomic data. Currently, three types of proximity graphs are supported: k-nearest neighbor (k-NN) graphs; radius-nearest neighbor (r-NN) graphs; and perturbed k-nearest neighbor (pk-NN) graphs.Availability: Proxi Python source code is freely available at https://bitbucket.org/idsrlab/proxi/.Contact:[email protected] information: Tutorials and online documentation are available at https://proxi.readthedocs.io

Download Full-text

LipidFinder 2.0: advanced informatics pipeline for lipidomics discovery applications

10.1101/2020.08.16.250878 ◽

2020 ◽

Author(s):

Jorge Alvarez-Jarreta ◽

Patricia R.S. Rodrigues ◽

Eoin Fahy ◽

Anne O’Connor ◽

Anna Price ◽

...

Keyword(s):

Open Access ◽

Real Data ◽

Supplementary Information ◽

Supplementary Data ◽

Scatter Plot ◽

Lipid Profiling ◽

Link Type ◽

False Discovery ◽

Assess Data Quality ◽

Lipid Structures

AbstractWe present LipidFinder 2.0, incorporating four new modules that apply artefact filters, remove lipid and contaminant stacks, in-source fragments and salt clusters, and a new isotope deletion method which is significantly more sensitive than available open-access alternatives. We also incorporate a novel false discovery rate (FDR) method, utilizing a target-decoy strategy, which allows users to assess data quality. A renewed lipid profiling method is introduced which searches three different databases from LIPID MAPS and returns bulk lipid structures only, and a lipid category scatter plot with color blind friendly pallet. An API interface with XCMS Online is made available on LipidFinder’s online version. We show using real data that LipidFinder 2.0 provides a significant improvement over non-lipid metabolite filtering and lipid profiling, compared to available tools.AvailabilityLipidFinder 2.0 is freely available at https://github.com/ODonnell-Lipidomics/LipidFinder and http://lipidmaps.org/resources/tools/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

EGAD: Ultra-fast functional analysis of gene networks

10.1101/053868 ◽

2016 ◽

Cited By ~ 2

Author(s):

Sara Ballouz ◽

Melanie Weber ◽

Paul Pavlidis ◽

Jesse Gillis

Keyword(s):

Gene Networks ◽

Gene Network ◽

Random Sets ◽

Supplementary Information ◽

High Throughput Analysis ◽

Link Type ◽

Gene Sets ◽

Common Task ◽

Supplementary Material ◽

Guilt By Association

AbstractSummaryEvaluating gene networks with respect to known biology is a common task but often a computationally costly one. Many computational experiments are difficult to apply exhaustively in network analysis due to run-times. To permit high-throughput analysis of gene networks, we have implemented a set of very efficient tools to calculate functional properties in networks based on guilt-by-association methods. EGAD (Extending ‘Guilt-by-Association’ by Degree) allows gene networks to be evaluated with respect to hundreds or thousands of gene sets. The methods predict novel members of gene groups, assess how well a gene network groups known sets of genes, and determines the degree to which generic predictions drive performance. By allowing fast evaluations, whether of random sets or real functional ones, EGAD provides the user with an assessment of performance which can easily be used in controlled evaluations across many parameters.Availability and ImplementationThe software package is freely available at https://github.com/sarbal/EGAD and implemented for use in R and Matlab. The package is also freely available under the LGPL license from the Bioconductor web site (http://bioconductor.org)[email protected] informationSupplementary data are available at Bioinformatics online and the full manual at http://gillislab.labsites.cshl.edu/software/egad-extending-guilt-by-association-by-degree/.

Download Full-text

Directed Conservative Causal Core Gene Networks

10.1101/271031 ◽

2018 ◽

Cited By ~ 1

Author(s):

Gokmen Altay

Keyword(s):

Gene Networks ◽

Large Scale ◽

Network Inference ◽

R Package ◽

Core Gene ◽

Supplementary Information ◽

Supplementary File ◽

Core Network ◽

Large Scale Networks ◽

Direction Information

AbstractMotivation:Inferring large scale directional networks with higher accuracy has important applications such as gene regulatory network or finance.Results:We modified a well-established conservative causal core network inference algorithm, C3NET, to be able to infer very large scale networks with direction information. This advanced version is called Ac3net. We demonstrate that Ac3net outperforms C3NET and many other popular algorithms when considering the directional interaction information of gene/protein networks. We provide and R package and present performance results that are reproducible via the Supplementary file.Availability:Ac3net is available on CRAN and at github.com/altayg/Ac3netContact:[email protected] information:Supplementary file is available online.

Download Full-text

Addressing false discoveries in network inference

Bioinformatics ◽

10.1093/bioinformatics/btv215 ◽

2015 ◽

Vol 31 (17) ◽

pp. 2836-2843 ◽

Cited By ~ 8

Author(s):

Tobias Petri ◽

Stefan Altmann ◽

Ludwig Geistlinger ◽

Ralf Zimmer ◽

Robert Küffner

Keyword(s):

Network Inference ◽

False Discoveries

Download Full-text

BRANE Cut: Biologically-Related A priori Network Enhancement with Graph cuts for Gene Regulatory Network Inference

10.1101/032383 ◽

2015 ◽

Author(s):

Aurélie Pirayre ◽

Camille Couprie ◽

Frédérique Bidard ◽

Laurent Duval ◽

Jean-Christophe Pesquet

Keyword(s):

Gene Regulatory Network ◽

Regulatory Network ◽

Gene Networks ◽

Network Inference ◽

State Of The Art ◽

A Priori ◽

Graph Cuts ◽

Gene Regulatory Network Inference ◽

Gene Regulatory ◽

Inference Methods

Background: Inferring gene networks from high-throughput data constitutes an important step in the discovery of relevant regulatory relationships in organism cells. Despite the large number of available Gene Regulatory Network inference methods, the problem remains challenging: the underdetermination in the space of possible solutions requires additional constraints that incorporate a priori information on gene interactions. Methods: Weighting all possible pairwise gene relationships by a probability of edge presence, we formulate the regulatory network inference as a discrete variational problem on graphs. We enforce biologically plausible coupling between groups and types of genes by minimizing an edge labeling functional coding for a priori structures. The optimization is carried out with Graph cuts, an approach popular in image processing and computer vision. We compare the inferred regulatory networks to results achieved by the mutual-information-based Context Likelihood of Relatedness (CLR) method and by the state-of-the-art GENIE3, winner of the DREAM4 multifactorial challenge. Results: Our BRANE Cut approach infers more accurately the five DREAM4 in silico networks (with improvements from 6% to 11%). On a real Escherichia coli compendium, an improvement of 11.8% compared to CLR and 3% compared to GENIE3 is obtained in terms of Area Under Precision-Recall curve. Up to 48 additional verified interactions are obtained over GENIE3 for a given precision. On this dataset involving 4345 genes, our method achieves a performance similar to that of GENIE3, while being more than seven times faster. The BRANE Cut code is available at: http://www-syscom.univ-mlv.fr/~pirayre/Codes-GRN-BRANE-cut.html Conclusions: BRANE Cut is a weighted graph thresholding method. Using biologically sound penalties and data-driven parameters, it improves three state-of-the-art GRN inference methods. It is applicable as a generic network inference post-processing, due its computational efficiency.

Download Full-text