EGAD: Ultra-fast functional analysis of gene networks

Mapping Intimacies ◽

10.1101/053868 ◽

2016 ◽

Cited By ~ 2

Author(s):

Sara Ballouz ◽

Melanie Weber ◽

Paul Pavlidis ◽

Jesse Gillis

Keyword(s):

Gene Networks ◽

Gene Network ◽

Random Sets ◽

Supplementary Information ◽

High Throughput Analysis ◽

Link Type ◽

Gene Sets ◽

Common Task ◽

Supplementary Material ◽

Guilt By Association

AbstractSummaryEvaluating gene networks with respect to known biology is a common task but often a computationally costly one. Many computational experiments are difficult to apply exhaustively in network analysis due to run-times. To permit high-throughput analysis of gene networks, we have implemented a set of very efficient tools to calculate functional properties in networks based on guilt-by-association methods. EGAD (Extending ‘Guilt-by-Association’ by Degree) allows gene networks to be evaluated with respect to hundreds or thousands of gene sets. The methods predict novel members of gene groups, assess how well a gene network groups known sets of genes, and determines the degree to which generic predictions drive performance. By allowing fast evaluations, whether of random sets or real functional ones, EGAD provides the user with an assessment of performance which can easily be used in controlled evaluations across many parameters.Availability and ImplementationThe software package is freely available at https://github.com/sarbal/EGAD and implemented for use in R and Matlab. The package is also freely available under the LGPL license from the Bioconductor web site (http://bioconductor.org)[email protected] informationSupplementary data are available at Bioinformatics online and the full manual at http://gillislab.labsites.cshl.edu/software/egad-extending-guilt-by-association-by-degree/.

Download Full-text

PathScore: a web tool for identifying altered pathways in cancer data

10.1101/067090 ◽

2016 ◽

Cited By ~ 2

Author(s):

Stephen G. Gaffney ◽

Jeffrey P. Townsend

Keyword(s):

Web Application ◽

Somatic Mutations ◽

Supplementary Information ◽

Web Tool ◽

Cancer Data ◽

Link Type ◽

Novel Approach ◽

Supplementary Material ◽

User Friendly ◽

Pathway Effect

ABSTRACTSummaryPathScore quantifies the level of enrichment of somatic mutations within curated pathways, applying a novel approach that identifies pathways enriched across patients. The application provides several user-friendly, interactive graphic interfaces for data exploration, including tools for comparing pathway effect sizes, significance, gene-set overlap and enrichment differences between projects.Availability and ImplementationWeb application available at pathscore.publichealth.yale.edu. Site implemented in Python and MySQL, with all major browsers supported. Source code available at github.com/sggaffney/pathscore with a GPLv3 [email protected] InformationAdditional documentation can be found at http://pathscore.publichealth.yale.edu/faq.

Download Full-text

Palaeolatitudinal distribution of the Ediacaran macrobiota

Journal of the Geological Society ◽

10.1144/jgs2021-030 ◽

2021 ◽

pp. jgs2021-030

Author(s):

Catherine E. Boddy ◽

Emily G. Mitchell ◽

Andrew Merdith ◽

Alexander G. Liu

Keyword(s):

Taxonomic Composition ◽

Supplementary Information ◽

Cambrian Explosion ◽

Content Type ◽

Link Type ◽

Environmental Perturbations ◽

Significant Difference ◽

Evolutionary Trajectories ◽

Cambrian Radiation ◽

Supplementary Material

Macrofossils of the late Ediacaran Period (c. 579–539 Ma) document diverse, complex multicellular eukaryotes, including early animals, prior to the Cambrian radiation of metazoan phyla. To investigate the relationships between environmental perturbations, biotic responses and early metazoan evolutionary trajectories, it is vital to distinguish between evolutionary and ecological controls on the global distribution of Ediacaran macrofossils. The contributions of temporal, palaeoenvironmental and lithological factors in shaping the observed variations in assemblage taxonomic composition between Ediacaran macrofossil sites are widely discussed, but the role of palaeogeography remains ambiguous. Here we investigate the influence of palaeolatitude on the spatial distribution of Ediacaran macrobiota through the late Ediacaran Period using two leading palaeogeographical reconstructions. We find that overall generic diversity was distributed across all palaeolatitudes. Among specific groups, the distributions of candidate ‘Bilateral’ and Frondomorph taxa exhibit weakly statistically significant and statistically significant differences between low and high palaeolatitudes within our favoured palaeogeographical reconstruction, respectively, whereas Algal, Tubular, Soft-bodied and Biomineralizing taxa show no significant difference. The recognition of statistically significant palaeolatitudinal differences in the distribution of certain morphogroups highlights the importance of considering palaeolatitudinal influences when interrogating trends in Ediacaran taxon distributions.Supplementary material: Supplementary information, data and code are available at https://doi.org/10.6084/m9.figshare.c.5488945Thematic collection: This article is part of the Advances in the Cambrian Explosion collection available at: https://www.lyellcollection.org/cc/advances-cambrian-explosion

Download Full-text

corto: a lightweight R package for Gene Network Inference and Master Regulator Analysis

10.1101/2020.02.10.942623 ◽

2020 ◽

Cited By ~ 1

Author(s):

Daniele Mercatelli ◽

Gonzalo Lopez-Garcia ◽

Federico M. Giorgi

Keyword(s):

Gene Expression ◽

Gene Networks ◽

Gene Network ◽

Network Inference ◽

Human Tumor ◽

R Package ◽

Specific Gene ◽

Master Regulator ◽

Gene Network Inference ◽

Link Type

AbstractMotivationGene Network Inference and Master Regulator Analysis (MRA) have been widely adopted to define specific transcriptional perturbations from gene expression signatures. Several tools exist to perform such analyses, but most require a computer cluster or large amounts of RAM to be executed.ResultsWe developed corto, a fast and lightweight R package to infer gene networks and perform MRA from gene expression data, with optional corrections for Copy Number Variations (CNVs) and able to run on signatures generated from RNA-Seq or ATAC-Seq data. We extensively benchmarked it to infer context-specific gene networks in 39 human tumor and 27 normal tissue datasets.AvailabilityCross-platform and multi-threaded R package on CRAN (stable version) https://cran.rproject.org/package=corto and Github (development release) https://github.com/federicogiorgi/[email protected]

Download Full-text

wft4galaxy: A Workflow Tester for Galaxy

10.1101/132001 ◽

2017 ◽

Author(s):

Marco Enrico Piras ◽

Luca Pireddu ◽

Gianluigi Zanetti

Keyword(s):

Complex Analysis ◽

Computer Programs ◽

Supplementary Information ◽

Automated Testing ◽

Continuous Integration ◽

Link Type ◽

Scientific Analysis ◽

The Galaxy ◽

Supplementary Material ◽

High Level

ABSTRACTMotivationWorkflow managers for scientific analysis provide a high-level programming platform facilitating standardization, automation, collaboration and access to sophisticated computing resources. The Galaxy workflow manager provides a prime example of this type of platform. As compositions of simpler tools, workflows effectively comprise specialized computer programs implementing often very complex analysis procedures. To date, no simple way exists to automatically test Galaxy workflows and ensure their correctness has appeared in the literature.ResultsWith wft4galaxy we offer a tool to bring automated testing to Galaxy workflows, making it feasible to bring continuous integration to their development and ensuring that defects are detected promptly. wft4galaxy can be easily installed as a regular Python program or launched directly as a Docker container – the latter reducing installation effort to a minimum.Availabilitywft4galaxy is available online at https://github.com/phnmnl/wft4galaxy under the Academic Free License v3.0.Supplementary informationSupplementary information is available at http://wft4galaxy.readthedocs.io.

Download Full-text

MODE-TASK: Large-scale protein motion tools

10.1101/217505 ◽

2017 ◽

Author(s):

Caroline Ross ◽

Bilal Nizami ◽

Michael Glenister ◽

Olivier Sheik Amamuddy ◽

Ali Rana Atilgan ◽

...

Keyword(s):

Large Scale ◽

Protein Complexes ◽

Normal Mode Analysis ◽

Md Simulations ◽

Supplementary Information ◽

Mode Analysis ◽

Analysis Tool ◽

Link Type ◽

Supplementary Material ◽

Anisotropic Network

AbstractSummaryMODE-TASK, a novel software suite, comprises Principle Component Analysis, Multidimensional Scaling, and t-Distributed Stochastic Neighbor Embedding techniques using molecular dynamics trajectories. MODE-TASK also includes a Normal Mode Analysis tool based on Anisotropic Network Model so as to provide a variety of ways to analyse and compare large-scale motions of protein complexes for which long MD simulations are prohibitive.Availability and ImplementationMODE-TASK has been open-sourced, and is available for download from https://github.com/RUBi-ZA/MODE-TASK, implemented in Python and C++.Supplementary informationDocumentation available at http://mode-task.readthedocs.io.

Download Full-text

Crosslink: A fast, scriptable genetic mapper for outcrossing species

10.1101/135277 ◽

2017 ◽

Cited By ~ 6

Author(s):

Robert J. Vickerstaff ◽

Richard J. Harrison

Keyword(s):

Large Datasets ◽

Supplementary Information ◽

Supplementary Data ◽

Link Type ◽

Mapping Software ◽

Outcrossing Species ◽

Supplementary Material ◽

Novel Approaches ◽

Similar Accuracy ◽

General Public License

AbstractSummaryCrosslink is genetic mapping software for outcrossing species designed to run efficiently on large datasets by combining the best from existing tools with novel approaches. Tests show it runs much faster than several comparable programs whilst retaining a similar accuracy.Availability and implementationAvailable under the GNU General Public License version 2 from https://github.com/eastmallingresearch/[email protected] informationSupplementary data are available at Bioinformatics online and from https://github.com/eastmallingresearch/crosslink/releases/tag/v0.5.

Download Full-text

Indoril: An I-PV Add-On for Visualization of Point Mutations on 3D Cartesian Coordinates

10.1101/148122 ◽

2017 ◽

Author(s):

Ibrahim Tanyalcin ◽

Julien Ferte ◽

Taushif Khan ◽

Carla Al Assaf

Keyword(s):

Protein Structure ◽

Mechanism Of Action ◽

Dimensional Space ◽

Point Mutations ◽

Supplementary Information ◽

Cartesian Coordinates ◽

3 Dimensional ◽

Link Type ◽

Supplementary Section ◽

Supplementary Material

ABSTRACTSummaryOne of the main goals of proteomics is to understand how point mutations impact on the protein structure. Visualization and clustering of point mutations on user-defined 3 dimensional space can allow researchers to have new insights and hypothesis about the mutation’s mechanism of action.Availability and ImplementationWe have developed an interactive I-PV add-on called INDORIL to visualize point mutations. Indoril can be downloaded fromhttp://[email protected]║[email protected] InformationPlease refer to the supplementary section andhttp://www.i-pv.org.

Download Full-text

IMMAN: an R/Bioconductor package for Interolog protein network reconstruction, Mapping and Mining ANalysis

10.1101/069104 ◽

2016 ◽

Cited By ~ 1

Author(s):

Minoo Ashtinai ◽

Payman Nickchi ◽

Soheil Jahangiri-Tazehkand ◽

Abdollah Safari ◽

Mehdi Mirzaie ◽

...

Keyword(s):

Protein Function ◽

Protein Function Prediction ◽

Protein Interaction Networks ◽

Interaction Networks ◽

Protein Network ◽

Supplementary Information ◽

Protein Protein Interaction ◽

Link Type ◽

Supplementary Material ◽

Protein Protein Interaction Networks

SummaryIMMAN is a software for reconstructing Interolog Protein Network (IPN) by integrating several Protein-protein Interaction Networks (PPIN). Users can unify different PPINs to mine conserved common network among species. IMMAN helps to retrieve IPNs with different degrees of conservation to engage for protein function prediction analysis based on protein networks.AvailabilityIMMAN is freely available at https://bioconductor.org/packages/IMMAN, http://profiles.bs.ipm.ir/softwares/IMMAN/[email protected], [email protected], [email protected] informationSupplementary data are available online.

Download Full-text

ClusterMine: a Knowledge-integrated Clustering Approach based on Expression Profiles of Gene Sets

10.1101/255711 ◽

2018 ◽

Author(s):

Hong-Dong Li ◽

Yunpei Xu ◽

Xiaoshu Zhu ◽

Quan Liu ◽

Gilbert S. Omenn ◽

...

Keyword(s):

Expression Profiles ◽

R Package ◽

Biological Data ◽

Supplementary Information ◽

Consensus Clustering ◽

Cluster Membership ◽

Link Type ◽

Novel Approach ◽

Gene Sets ◽

Biological Interpretation

ABSTRACTMotivationClustering analysis is essential for understanding complex biological data. In widely used methods such as hierarchical clustering (HC) and consensus clustering (CC), expression profiles of all genes are often used to assess similarity between samples for clustering. These methods output sample clusters, but are not able to provide information about which gene sets (functions) contribute most to the clustering. So interpretability of their results is limited. We hypothesized that integrating prior knowledge of annotated biological processes would not only achieve satisfying clustering performance but also, more importantly, enable potential biological interpretation of clusters.ResultsHere we report ClusterMine, a novel approach that identifies clusters by assessing functional similarity between samples through integrating known annotated gene sets, e.g., in Gene Ontology. In addition to outputting cluster membership of each sample as conventional approaches do, it outputs gene sets that are most likely to contribute to the clustering, a feature facilitating biological interpretation. Using three cancer datasets, two single cell RNA-sequencing based cell differentiation datasets, one cell cycle dataset and two datasets of cells of different tissue origins, we found that ClusterMine achieved similar or better clustering performance and that top-scored gene sets prioritized by ClusterMine are biologically relevant.Implementation and availabilityClusterMine is implemented as an R package and is freely available at: www.genemine.org/[email protected] InformationSupplementary data are available at Bioinformatics online.

Download Full-text

GTShark: Genotype compression in large project

10.1101/494104 ◽

2018 ◽

Author(s):

Sebastian Deorowicz ◽

Agnieszka Danek

Keyword(s):

Web Site ◽

Supplementary Information ◽

Supplementary Data ◽

Link Type ◽

Large Project ◽

Supplementary Material

AbstractSummaryNowadays large sequencing projects handle tens of thousands of individuals. The huge files summarizing the findings definitely require compression. We propose a tool able to compress large collections of genotypes as well as single samples in such projects to sizes not achievable to date.Availability and Implementationhttps://github.com/refresh-bio/[email protected] informationSupplementary data are available at publisher’s Web site.

Download Full-text