scholarly journals Controlling false discoveries in Bayesian gene networks with lasso regression p-values

2018 ◽  
Author(s):  
Lingfei Wang ◽  
Tom Michoel

AbstractMotivationBayesian networks can represent directed gene regulations and therefore are favored over co-expression networks. However, hardly any Bayesian network study concerns the false discovery control (FDC) of network edges, leading to low accuracies due to systematic biases from inconsistent false discovery levels in the same study.ResultsWe design four empirical tests to examine the FDC of Bayesian networks from three p-value based lasso regression variable selections — two existing and one we originate. Our method, lassopv, computes p-values for the critical regularization strength at which a predictor starts to contribute to lasso regression. Using null and Geuvadis datasets, we find that lassopv obtains optimal FDC in Bayesian gene networks, whilst existing methods have defective p-values. The FDC concept and tests extend to most network inference scenarios and will guide the design and improvement of new and existing methods. Our novel variable selection method with lasso regression also allows FDC on other datasets and questions, even beyond network inference and computational biology.AvailabilityLassopv is implemented in R and freely available at https://github.com/lingfeiwang/lassopv and https://cran.r-project.org/[email protected] informationSupplementary data are available at Bioinformatics online.

Author(s):  
Daniele Mercatelli ◽  
Gonzalo Lopez-Garcia ◽  
Federico M. Giorgi

AbstractMotivationGene Network Inference and Master Regulator Analysis (MRA) have been widely adopted to define specific transcriptional perturbations from gene expression signatures. Several tools exist to perform such analyses, but most require a computer cluster or large amounts of RAM to be executed.ResultsWe developed corto, a fast and lightweight R package to infer gene networks and perform MRA from gene expression data, with optional corrections for Copy Number Variations (CNVs) and able to run on signatures generated from RNA-Seq or ATAC-Seq data. We extensively benchmarked it to infer context-specific gene networks in 39 human tumor and 27 normal tissue datasets.AvailabilityCross-platform and multi-threaded R package on CRAN (stable version) https://cran.rproject.org/package=corto and Github (development release) https://github.com/federicogiorgi/[email protected]


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Jiajin Chen ◽  
Ruyang Zhang ◽  
Xuesi Dong ◽  
Lijuan Lin ◽  
Ying Zhu ◽  
...  

Abstract Background High-throughput technologies have brought tremendous changes to biological domains, and the resulting high-dimensional data has also posed enormous challenges to computational science. A Bayesian network is a probabilistic graphical model represented by a directed acyclic graph, which provides concise semantics to describe the relationship between entities and has an independence assumption that is suitable for sparse omics data. Bayesian networks have been broadly used in biomedical research fields, including disease risk assessment and prognostic prediction. However, the inference and visualization of Bayesian networks are unfriendly to the users lacking programming skills. Results We developed an R/Shiny application, shinyBN, which is an online graphical user interface to facilitate the inference and visualization of Bayesian networks. shinyBN supports multiple types of input and provides flexible settings for network rendering and inference. For output, users can download network plots, prediction results and external validation results in publication-ready high-resolution figures. Conclusion Our user-friendly application (shinyBN) provides users with an easy method for Bayesian network modeling, inference and visualization via mouse clicks. shinyBN can be used in the R environment or online and is compatible with three major operating systems, including Windows, Linux and Mac OS. shinyBN is deployed at https://jiajin.shinyapps.io/shinyBN/. Source codes and the manual are freely available at https://github.com/JiajinChen/shinyBN.


2019 ◽  
Author(s):  
Mingze Bai ◽  
Chunyuan Qin ◽  
Kunxian Shu ◽  
Johannes Griss ◽  
Yasset Perez-Riverol ◽  
...  

AbstractMotivationSpectrum clustering has been used to enhance proteomics data analysis: some originally unidentified spectra can potentially be identified and individual peptides can be evaluated to find potential mis-identifications by using clusters of identified spectra. The Phoenix Enhancer provides an infrastructure to analyze tandem mass spectra and the corresponding peptides in the context of previously identified public data. Based on PRIDE Cluster data and a newly developed pipeline, four functionalities are provided: i) evaluate the original peptide identifications in an individual dataset, to find low confidence peptide spectrum matches (PSMs) which could correspond to mis-identifications; ii) provide confidence scores for all originally identified PSMs, to help users evaluate their quality (complementary to getting a global false discovery rate); iii) identify potential new PSMs for originally unidentified spectra; and iv) provide a collection of browsing and visualization tools to analyze and export the results. In addition to the web based service, the code is open-source and easy to re-deploy on local computers using Docker containers.AvailabilityThe service of Phoenix Enhancer is available at http://enhancer.ncpsb.org. All source code is freely available in GitHub (https://github.com/phoenix-cluster/) and can be deployed in the Cloud and HPC [email protected] informationSupplementary data are available online.


2020 ◽  
Vol 36 (12) ◽  
pp. 3916-3917 ◽  
Author(s):  
Daniele Mercatelli ◽  
Gonzalo Lopez-Garcia ◽  
Federico M Giorgi

Abstract Motivation Gene network inference and master regulator analysis (MRA) have been widely adopted to define specific transcriptional perturbations from gene expression signatures. Several tools exist to perform such analyses but most require a computer cluster or large amounts of RAM to be executed. Results We developed corto, a fast and lightweight R package to infer gene networks and perform MRA from gene expression data, with optional corrections for copy-number variations and able to run on signatures generated from RNA-Seq or ATAC-Seq data. We extensively benchmarked it to infer context-specific gene networks in 39 human tumor and 27 normal tissue datasets. Availability and implementation Cross-platform and multi-threaded R package on CRAN (stable version) https://cran.r-project.org/package=corto and Github (development release) https://github.com/federicogiorgi/corto. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Yasser EL-Manzalawy

AbstractSummary: Recent technological advances in high-throughput metagenomic sequencing have provided unique opportunities for studying the diversity and dynamics of microbial communities under different health or environmental conditions. Graph-based representation of metagenomic data is a promising direction not only for analyzing microbial interactions but also for a broad range of machine learning tasks including feature selection, classification, clustering, anomaly detection, and dimensionality reduction. We present Proxi, an open source Python package for learning different types of proximity graphs from metagenomic data. Currently, three types of proximity graphs are supported: k-nearest neighbor (k-NN) graphs; radius-nearest neighbor (r-NN) graphs; and perturbed k-nearest neighbor (pk-NN) graphs.Availability: Proxi Python source code is freely available at https://bitbucket.org/idsrlab/proxi/.Contact:[email protected] information: Tutorials and online documentation are available at https://proxi.readthedocs.io


2020 ◽  
Author(s):  
Jorge Alvarez-Jarreta ◽  
Patricia R.S. Rodrigues ◽  
Eoin Fahy ◽  
Anne O’Connor ◽  
Anna Price ◽  
...  

AbstractWe present LipidFinder 2.0, incorporating four new modules that apply artefact filters, remove lipid and contaminant stacks, in-source fragments and salt clusters, and a new isotope deletion method which is significantly more sensitive than available open-access alternatives. We also incorporate a novel false discovery rate (FDR) method, utilizing a target-decoy strategy, which allows users to assess data quality. A renewed lipid profiling method is introduced which searches three different databases from LIPID MAPS and returns bulk lipid structures only, and a lipid category scatter plot with color blind friendly pallet. An API interface with XCMS Online is made available on LipidFinder’s online version. We show using real data that LipidFinder 2.0 provides a significant improvement over non-lipid metabolite filtering and lipid profiling, compared to available tools.AvailabilityLipidFinder 2.0 is freely available at https://github.com/ODonnell-Lipidomics/LipidFinder and http://lipidmaps.org/resources/tools/[email protected] informationSupplementary data are available at Bioinformatics online.


2016 ◽  
Author(s):  
Sara Ballouz ◽  
Melanie Weber ◽  
Paul Pavlidis ◽  
Jesse Gillis

AbstractSummaryEvaluating gene networks with respect to known biology is a common task but often a computationally costly one. Many computational experiments are difficult to apply exhaustively in network analysis due to run-times. To permit high-throughput analysis of gene networks, we have implemented a set of very efficient tools to calculate functional properties in networks based on guilt-by-association methods. EGAD (Extending ‘Guilt-by-Association’ by Degree) allows gene networks to be evaluated with respect to hundreds or thousands of gene sets. The methods predict novel members of gene groups, assess how well a gene network groups known sets of genes, and determines the degree to which generic predictions drive performance. By allowing fast evaluations, whether of random sets or real functional ones, EGAD provides the user with an assessment of performance which can easily be used in controlled evaluations across many parameters.Availability and ImplementationThe software package is freely available at https://github.com/sarbal/EGAD and implemented for use in R and Matlab. The package is also freely available under the LGPL license from the Bioconductor web site (http://bioconductor.org)[email protected] informationSupplementary data are available at Bioinformatics online and the full manual at http://gillislab.labsites.cshl.edu/software/egad-extending-guilt-by-association-by-degree/.


2018 ◽  
Author(s):  
Gokmen Altay

AbstractMotivation:Inferring large scale directional networks with higher accuracy has important applications such as gene regulatory network or finance.Results:We modified a well-established conservative causal core network inference algorithm, C3NET, to be able to infer very large scale networks with direction information. This advanced version is called Ac3net. We demonstrate that Ac3net outperforms C3NET and many other popular algorithms when considering the directional interaction information of gene/protein networks. We provide and R package and present performance results that are reproducible via the Supplementary file.Availability:Ac3net is available on CRAN and at github.com/altayg/Ac3netContact:[email protected] information:Supplementary file is available online.


2015 ◽  
Vol 31 (17) ◽  
pp. 2836-2843 ◽  
Author(s):  
Tobias Petri ◽  
Stefan Altmann ◽  
Ludwig Geistlinger ◽  
Ralf Zimmer ◽  
Robert Küffner

2015 ◽  
Author(s):  
Aurélie Pirayre ◽  
Camille Couprie ◽  
Frédérique Bidard ◽  
Laurent Duval ◽  
Jean-Christophe Pesquet

Background: Inferring gene networks from high-throughput data constitutes an important step in the discovery of relevant regulatory relationships in organism cells. Despite the large number of available Gene Regulatory Network inference methods, the problem remains challenging: the underdetermination in the space of possible solutions requires additional constraints that incorporate a priori information on gene interactions. Methods: Weighting all possible pairwise gene relationships by a probability of edge presence, we formulate the regulatory network inference as a discrete variational problem on graphs. We enforce biologically plausible coupling between groups and types of genes by minimizing an edge labeling functional coding for a priori structures. The optimization is carried out with Graph cuts, an approach popular in image processing and computer vision. We compare the inferred regulatory networks to results achieved by the mutual-information-based Context Likelihood of Relatedness (CLR) method and by the state-of-the-art GENIE3, winner of the DREAM4 multifactorial challenge. Results: Our BRANE Cut approach infers more accurately the five DREAM4 in silico networks (with improvements from 6% to 11%). On a real Escherichia coli compendium, an improvement of 11.8% compared to CLR and 3% compared to GENIE3 is obtained in terms of Area Under Precision-Recall curve. Up to 48 additional verified interactions are obtained over GENIE3 for a given precision. On this dataset involving 4345 genes, our method achieves a performance similar to that of GENIE3, while being more than seven times faster. The BRANE Cut code is available at: http://www-syscom.univ-mlv.fr/~pirayre/Codes-GRN-BRANE-cut.html Conclusions: BRANE Cut is a weighted graph thresholding method. Using biologically sound penalties and data-driven parameters, it improves three state-of-the-art GRN inference methods. It is applicable as a generic network inference post-processing, due its computational efficiency.


Sign in / Sign up

Export Citation Format

Share Document