scholarly journals EGAD: Ultra-fast functional analysis of gene networks

2016 ◽  
Author(s):  
Sara Ballouz ◽  
Melanie Weber ◽  
Paul Pavlidis ◽  
Jesse Gillis

AbstractSummaryEvaluating gene networks with respect to known biology is a common task but often a computationally costly one. Many computational experiments are difficult to apply exhaustively in network analysis due to run-times. To permit high-throughput analysis of gene networks, we have implemented a set of very efficient tools to calculate functional properties in networks based on guilt-by-association methods. EGAD (Extending ‘Guilt-by-Association’ by Degree) allows gene networks to be evaluated with respect to hundreds or thousands of gene sets. The methods predict novel members of gene groups, assess how well a gene network groups known sets of genes, and determines the degree to which generic predictions drive performance. By allowing fast evaluations, whether of random sets or real functional ones, EGAD provides the user with an assessment of performance which can easily be used in controlled evaluations across many parameters.Availability and ImplementationThe software package is freely available at https://github.com/sarbal/EGAD and implemented for use in R and Matlab. The package is also freely available under the LGPL license from the Bioconductor web site (http://bioconductor.org)[email protected] informationSupplementary data are available at Bioinformatics online and the full manual at http://gillislab.labsites.cshl.edu/software/egad-extending-guilt-by-association-by-degree/.

2016 ◽  
Author(s):  
Stephen G. Gaffney ◽  
Jeffrey P. Townsend

ABSTRACTSummaryPathScore quantifies the level of enrichment of somatic mutations within curated pathways, applying a novel approach that identifies pathways enriched across patients. The application provides several user-friendly, interactive graphic interfaces for data exploration, including tools for comparing pathway effect sizes, significance, gene-set overlap and enrichment differences between projects.Availability and ImplementationWeb application available at pathscore.publichealth.yale.edu. Site implemented in Python and MySQL, with all major browsers supported. Source code available at github.com/sggaffney/pathscore with a GPLv3 [email protected] InformationAdditional documentation can be found at http://pathscore.publichealth.yale.edu/faq.


2021 ◽  
pp. jgs2021-030
Author(s):  
Catherine E. Boddy ◽  
Emily G. Mitchell ◽  
Andrew Merdith ◽  
Alexander G. Liu

Macrofossils of the late Ediacaran Period (c. 579–539 Ma) document diverse, complex multicellular eukaryotes, including early animals, prior to the Cambrian radiation of metazoan phyla. To investigate the relationships between environmental perturbations, biotic responses and early metazoan evolutionary trajectories, it is vital to distinguish between evolutionary and ecological controls on the global distribution of Ediacaran macrofossils. The contributions of temporal, palaeoenvironmental and lithological factors in shaping the observed variations in assemblage taxonomic composition between Ediacaran macrofossil sites are widely discussed, but the role of palaeogeography remains ambiguous. Here we investigate the influence of palaeolatitude on the spatial distribution of Ediacaran macrobiota through the late Ediacaran Period using two leading palaeogeographical reconstructions. We find that overall generic diversity was distributed across all palaeolatitudes. Among specific groups, the distributions of candidate ‘Bilateral’ and Frondomorph taxa exhibit weakly statistically significant and statistically significant differences between low and high palaeolatitudes within our favoured palaeogeographical reconstruction, respectively, whereas Algal, Tubular, Soft-bodied and Biomineralizing taxa show no significant difference. The recognition of statistically significant palaeolatitudinal differences in the distribution of certain morphogroups highlights the importance of considering palaeolatitudinal influences when interrogating trends in Ediacaran taxon distributions.Supplementary material: Supplementary information, data and code are available at https://doi.org/10.6084/m9.figshare.c.5488945Thematic collection: This article is part of the Advances in the Cambrian Explosion collection available at: https://www.lyellcollection.org/cc/advances-cambrian-explosion


Author(s):  
Daniele Mercatelli ◽  
Gonzalo Lopez-Garcia ◽  
Federico M. Giorgi

AbstractMotivationGene Network Inference and Master Regulator Analysis (MRA) have been widely adopted to define specific transcriptional perturbations from gene expression signatures. Several tools exist to perform such analyses, but most require a computer cluster or large amounts of RAM to be executed.ResultsWe developed corto, a fast and lightweight R package to infer gene networks and perform MRA from gene expression data, with optional corrections for Copy Number Variations (CNVs) and able to run on signatures generated from RNA-Seq or ATAC-Seq data. We extensively benchmarked it to infer context-specific gene networks in 39 human tumor and 27 normal tissue datasets.AvailabilityCross-platform and multi-threaded R package on CRAN (stable version) https://cran.rproject.org/package=corto and Github (development release) https://github.com/federicogiorgi/[email protected]


2017 ◽  
Author(s):  
Marco Enrico Piras ◽  
Luca Pireddu ◽  
Gianluigi Zanetti

ABSTRACTMotivationWorkflow managers for scientific analysis provide a high-level programming platform facilitating standardization, automation, collaboration and access to sophisticated computing resources. The Galaxy workflow manager provides a prime example of this type of platform. As compositions of simpler tools, workflows effectively comprise specialized computer programs implementing often very complex analysis procedures. To date, no simple way exists to automatically test Galaxy workflows and ensure their correctness has appeared in the literature.ResultsWith wft4galaxy we offer a tool to bring automated testing to Galaxy workflows, making it feasible to bring continuous integration to their development and ensuring that defects are detected promptly. wft4galaxy can be easily installed as a regular Python program or launched directly as a Docker container – the latter reducing installation effort to a minimum.Availabilitywft4galaxy is available online at https://github.com/phnmnl/wft4galaxy under the Academic Free License v3.0.Supplementary informationSupplementary information is available at http://wft4galaxy.readthedocs.io.


2017 ◽  
Author(s):  
Caroline Ross ◽  
Bilal Nizami ◽  
Michael Glenister ◽  
Olivier Sheik Amamuddy ◽  
Ali Rana Atilgan ◽  
...  

AbstractSummaryMODE-TASK, a novel software suite, comprises Principle Component Analysis, Multidimensional Scaling, and t-Distributed Stochastic Neighbor Embedding techniques using molecular dynamics trajectories. MODE-TASK also includes a Normal Mode Analysis tool based on Anisotropic Network Model so as to provide a variety of ways to analyse and compare large-scale motions of protein complexes for which long MD simulations are prohibitive.Availability and ImplementationMODE-TASK has been open-sourced, and is available for download from https://github.com/RUBi-ZA/MODE-TASK, implemented in Python and C++.Supplementary informationDocumentation available at http://mode-task.readthedocs.io.


2017 ◽  
Author(s):  
Robert J. Vickerstaff ◽  
Richard J. Harrison

AbstractSummaryCrosslink is genetic mapping software for outcrossing species designed to run efficiently on large datasets by combining the best from existing tools with novel approaches. Tests show it runs much faster than several comparable programs whilst retaining a similar accuracy.Availability and implementationAvailable under the GNU General Public License version 2 from https://github.com/eastmallingresearch/[email protected] informationSupplementary data are available at Bioinformatics online and from https://github.com/eastmallingresearch/crosslink/releases/tag/v0.5.


2017 ◽  
Author(s):  
Ibrahim Tanyalcin ◽  
Julien Ferte ◽  
Taushif Khan ◽  
Carla Al Assaf

ABSTRACTSummaryOne of the main goals of proteomics is to understand how point mutations impact on the protein structure. Visualization and clustering of point mutations on user-defined 3 dimensional space can allow researchers to have new insights and hypothesis about the mutation’s mechanism of action.Availability and ImplementationWe have developed an interactive I-PV add-on called INDORIL to visualize point mutations. Indoril can be downloaded fromhttp://[email protected][email protected] InformationPlease refer to the supplementary section andhttp://www.i-pv.org.


2016 ◽  
Author(s):  
Minoo Ashtinai ◽  
Payman Nickchi ◽  
Soheil Jahangiri-Tazehkand ◽  
Abdollah Safari ◽  
Mehdi Mirzaie ◽  
...  

SummaryIMMAN is a software for reconstructing Interolog Protein Network (IPN) by integrating several Protein-protein Interaction Networks (PPIN). Users can unify different PPINs to mine conserved common network among species. IMMAN helps to retrieve IPNs with different degrees of conservation to engage for protein function prediction analysis based on protein networks.AvailabilityIMMAN is freely available at https://bioconductor.org/packages/IMMAN, http://profiles.bs.ipm.ir/softwares/IMMAN/[email protected], [email protected], [email protected] informationSupplementary data are available online.


2018 ◽  
Author(s):  
Hong-Dong Li ◽  
Yunpei Xu ◽  
Xiaoshu Zhu ◽  
Quan Liu ◽  
Gilbert S. Omenn ◽  
...  

ABSTRACTMotivationClustering analysis is essential for understanding complex biological data. In widely used methods such as hierarchical clustering (HC) and consensus clustering (CC), expression profiles of all genes are often used to assess similarity between samples for clustering. These methods output sample clusters, but are not able to provide information about which gene sets (functions) contribute most to the clustering. So interpretability of their results is limited. We hypothesized that integrating prior knowledge of annotated biological processes would not only achieve satisfying clustering performance but also, more importantly, enable potential biological interpretation of clusters.ResultsHere we report ClusterMine, a novel approach that identifies clusters by assessing functional similarity between samples through integrating known annotated gene sets, e.g., in Gene Ontology. In addition to outputting cluster membership of each sample as conventional approaches do, it outputs gene sets that are most likely to contribute to the clustering, a feature facilitating biological interpretation. Using three cancer datasets, two single cell RNA-sequencing based cell differentiation datasets, one cell cycle dataset and two datasets of cells of different tissue origins, we found that ClusterMine achieved similar or better clustering performance and that top-scored gene sets prioritized by ClusterMine are biologically relevant.Implementation and availabilityClusterMine is implemented as an R package and is freely available at: www.genemine.org/[email protected] InformationSupplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Sebastian Deorowicz ◽  
Agnieszka Danek

AbstractSummaryNowadays large sequencing projects handle tens of thousands of individuals. The huge files summarizing the findings definitely require compression. We propose a tool able to compress large collections of genotypes as well as single samples in such projects to sizes not achievable to date.Availability and Implementationhttps://github.com/refresh-bio/[email protected] informationSupplementary data are available at publisher’s Web site.


Sign in / Sign up

Export Citation Format

Share Document