GenomeScope: Fast reference-free genome profiling from short reads

Mapping Intimacies ◽

10.1101/075978 ◽

2016 ◽

Cited By ~ 3

Author(s):

Gregory W. Vurture ◽

Fritz J. Sedlazeck ◽

Maria Nattestad ◽

Charles J. Underwood ◽

Han Fang ◽

...

Keyword(s):

Error Rates ◽

Supplementary Information ◽

Web Tool ◽

Short Reads ◽

Link Type ◽

Repeat Content ◽

A Genome ◽

Wide Range ◽

Supplementary Material ◽

Downstream Analysis

AbstractSummaryGenomeScope is an open-source web tool to rapidly estimate the overall characteristics of a genome, including genome size, heterozygosity rate, and repeat content from unprocessed short reads. These features are essential for studying genome evolution, and help to choose parameters for downstream analysis. We demonstrate its accuracy on 324 simulated and 16 real datasets with a wide range in genome sizes, heterozygosity levels, and error rates.Availability and Implementationhttp://genomescope.org, https://github.com/schatzlab/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

PathScore: a web tool for identifying altered pathways in cancer data

10.1101/067090 ◽

2016 ◽

Cited By ~ 2

Author(s):

Stephen G. Gaffney ◽

Jeffrey P. Townsend

Keyword(s):

Web Application ◽

Somatic Mutations ◽

Supplementary Information ◽

Web Tool ◽

Cancer Data ◽

Link Type ◽

Novel Approach ◽

Supplementary Material ◽

User Friendly ◽

Pathway Effect

ABSTRACTSummaryPathScore quantifies the level of enrichment of somatic mutations within curated pathways, applying a novel approach that identifies pathways enriched across patients. The application provides several user-friendly, interactive graphic interfaces for data exploration, including tools for comparing pathway effect sizes, significance, gene-set overlap and enrichment differences between projects.Availability and ImplementationWeb application available at pathscore.publichealth.yale.edu. Site implemented in Python and MySQL, with all major browsers supported. Source code available at github.com/sggaffney/pathscore with a GPLv3 [email protected] InformationAdditional documentation can be found at http://pathscore.publichealth.yale.edu/faq.

Download Full-text

TRTools: a toolkit for genome-wide analysis of tandem repeats

10.1101/2020.03.17.996033 ◽

2020 ◽

Cited By ~ 1

Author(s):

Nima Mousavi ◽

Jonathan Margoliash ◽

Neha Pusarla ◽

Shubham Saini ◽

Richard Yanicky ◽

...

Keyword(s):

Quality Control ◽

Tandem Repeats ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Genome Wide Analysis ◽

Link Type ◽

Genome Wide ◽

Wide Range ◽

Downstream Analysis

AbstractSummaryA rich set of tools have recently been developed for performing genome-wide genotyping of tandem repeats (TRs). However, standardized tools for downstream analysis of these results are lacking. To facilitate TR analysis applications, we present TRTools, a Python library and a suite of command-line tools for filtering, merging, and quality control of TR genotype files. TRTools utilizes an internal harmonization module making it compatible with outputs from a wide range of TR genotypers.AvailabilityTRTools is freely available at https://github.com/gymreklab/[email protected] informationSupplementary data are available at bioRxiv.

Download Full-text

BloodGen3Module: Blood transcriptional module repertoire analysis and visualization using R

Bioinformatics ◽

10.1093/bioinformatics/btab121 ◽

2021 ◽

Author(s):

Darawan Rinchai ◽

Jessica Roelands ◽

Mohammed Toufiq ◽

Wouter Hendrickx ◽

Matthew C Altman ◽

...

Keyword(s):

Transcript Abundance ◽

R Package ◽

Supplementary Information ◽

Illustrative Case ◽

Bioinformatic Tools ◽

Transcriptional Module ◽

Wide Range ◽

Downstream Analysis ◽

Computing Module ◽

Parallel Workflow

Abstract Motivation We previously described the construction and characterization of generic and reusable blood transcriptional module repertoires. More recently we released a third iteration (“BloodGen3” module repertoire) that comprises 382 functionally annotated gene sets (modules) and encompasses 14,168 transcripts. Custom bioinformatic tools are needed to support downstream analysis, visualization and interpretation relying on such fixed module repertoires. Results We have developed and describe here a R package, BloodGen3Module. The functions of our package permit group comparison analyses to be performed at the module-level, and to display the results as annotated fingerprint grid plots. A parallel workflow for computing module repertoire changes for individual samples rather than groups of samples is also available; these results are displayed as fingerprint heatmaps. An illustrative case is used to demonstrate the steps involved in generating blood transcriptome repertoire fingerprints of septic patients. Taken together, this resource could facilitate the analysis and interpretation of changes in blood transcript abundance observed across a wide range of pathological and physiological states. Availability The BloodGen3Module package and documentation are freely available from Github: https://github.com/Drinchai/BloodGen3Module Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Palaeolatitudinal distribution of the Ediacaran macrobiota

Journal of the Geological Society ◽

10.1144/jgs2021-030 ◽

2021 ◽

pp. jgs2021-030

Author(s):

Catherine E. Boddy ◽

Emily G. Mitchell ◽

Andrew Merdith ◽

Alexander G. Liu

Keyword(s):

Taxonomic Composition ◽

Supplementary Information ◽

Cambrian Explosion ◽

Content Type ◽

Link Type ◽

Environmental Perturbations ◽

Significant Difference ◽

Evolutionary Trajectories ◽

Cambrian Radiation ◽

Supplementary Material

Macrofossils of the late Ediacaran Period (c. 579–539 Ma) document diverse, complex multicellular eukaryotes, including early animals, prior to the Cambrian radiation of metazoan phyla. To investigate the relationships between environmental perturbations, biotic responses and early metazoan evolutionary trajectories, it is vital to distinguish between evolutionary and ecological controls on the global distribution of Ediacaran macrofossils. The contributions of temporal, palaeoenvironmental and lithological factors in shaping the observed variations in assemblage taxonomic composition between Ediacaran macrofossil sites are widely discussed, but the role of palaeogeography remains ambiguous. Here we investigate the influence of palaeolatitude on the spatial distribution of Ediacaran macrobiota through the late Ediacaran Period using two leading palaeogeographical reconstructions. We find that overall generic diversity was distributed across all palaeolatitudes. Among specific groups, the distributions of candidate ‘Bilateral’ and Frondomorph taxa exhibit weakly statistically significant and statistically significant differences between low and high palaeolatitudes within our favoured palaeogeographical reconstruction, respectively, whereas Algal, Tubular, Soft-bodied and Biomineralizing taxa show no significant difference. The recognition of statistically significant palaeolatitudinal differences in the distribution of certain morphogroups highlights the importance of considering palaeolatitudinal influences when interrogating trends in Ediacaran taxon distributions.Supplementary material: Supplementary information, data and code are available at https://doi.org/10.6084/m9.figshare.c.5488945Thematic collection: This article is part of the Advances in the Cambrian Explosion collection available at: https://www.lyellcollection.org/cc/advances-cambrian-explosion

Download Full-text

MAJIQ-SPEL: web-tool to interrogate classical and complex splicing variations from RNA-Seq data

Bioinformatics ◽

10.1093/bioinformatics/btx565 ◽

2017 ◽

Vol 34 (2) ◽

pp. 300-302 ◽

Cited By ~ 2

Author(s):

Christopher J Green ◽

Matthew R Gazzara ◽

Yoseph Barash

Keyword(s):

Experimental Validation ◽

Ucsc Genome Browser ◽

Supplementary Information ◽

Supplementary Data ◽

Rna Seq ◽

Web Tool ◽

Rt Pcr ◽

Design Algorithm ◽

Gene Isoforms ◽

Downstream Analysis

Abstract Summary Analysis of RNA sequencing (RNA-Seq) data have highlighted the fact that most genes undergo alternative splicing (AS) and that these patterns are tightly regulated. Many of these events are complex, resulting in numerous possible isoforms that quickly become difficult to visualize, interpret and experimentally validate. To address these challenges we developed MAJIQ-SPEL, a web-tool that takes as input local splicing variations (LSVs) quantified from RNA-Seq data and provides users with visualization and quantification of gene isoforms associated with those. Importantly, MAJIQ-SPEL is able to handle both classical (binary) and complex, non-binary, splicing variations. Using a matching primer design algorithm it also suggests to users possible primers for experimental validation by RT-PCR and displays those, along with the matching protein domains affected by the LSV, on UCSC Genome Browser for further downstream analysis. Availability and implementation Program and code will be available athttp://majiq.biociphers.org/majiq-spel. Supplementary information Supplementary data are available atBioinformatics online.

Download Full-text

wft4galaxy: A Workflow Tester for Galaxy

10.1101/132001 ◽

2017 ◽

Author(s):

Marco Enrico Piras ◽

Luca Pireddu ◽

Gianluigi Zanetti

Keyword(s):

Complex Analysis ◽

Computer Programs ◽

Supplementary Information ◽

Automated Testing ◽

Continuous Integration ◽

Link Type ◽

Scientific Analysis ◽

The Galaxy ◽

Supplementary Material ◽

High Level

ABSTRACTMotivationWorkflow managers for scientific analysis provide a high-level programming platform facilitating standardization, automation, collaboration and access to sophisticated computing resources. The Galaxy workflow manager provides a prime example of this type of platform. As compositions of simpler tools, workflows effectively comprise specialized computer programs implementing often very complex analysis procedures. To date, no simple way exists to automatically test Galaxy workflows and ensure their correctness has appeared in the literature.ResultsWith wft4galaxy we offer a tool to bring automated testing to Galaxy workflows, making it feasible to bring continuous integration to their development and ensuring that defects are detected promptly. wft4galaxy can be easily installed as a regular Python program or launched directly as a Docker container – the latter reducing installation effort to a minimum.Availabilitywft4galaxy is available online at https://github.com/phnmnl/wft4galaxy under the Academic Free License v3.0.Supplementary informationSupplementary information is available at http://wft4galaxy.readthedocs.io.

Download Full-text

MODE-TASK: Large-scale protein motion tools

10.1101/217505 ◽

2017 ◽

Author(s):

Caroline Ross ◽

Bilal Nizami ◽

Michael Glenister ◽

Olivier Sheik Amamuddy ◽

Ali Rana Atilgan ◽

...

Keyword(s):

Large Scale ◽

Protein Complexes ◽

Normal Mode Analysis ◽

Md Simulations ◽

Supplementary Information ◽

Mode Analysis ◽

Analysis Tool ◽

Link Type ◽

Supplementary Material ◽

Anisotropic Network

AbstractSummaryMODE-TASK, a novel software suite, comprises Principle Component Analysis, Multidimensional Scaling, and t-Distributed Stochastic Neighbor Embedding techniques using molecular dynamics trajectories. MODE-TASK also includes a Normal Mode Analysis tool based on Anisotropic Network Model so as to provide a variety of ways to analyse and compare large-scale motions of protein complexes for which long MD simulations are prohibitive.Availability and ImplementationMODE-TASK has been open-sourced, and is available for download from https://github.com/RUBi-ZA/MODE-TASK, implemented in Python and C++.Supplementary informationDocumentation available at http://mode-task.readthedocs.io.

Download Full-text

Crosslink: A fast, scriptable genetic mapper for outcrossing species

10.1101/135277 ◽

2017 ◽

Cited By ~ 6

Author(s):

Robert J. Vickerstaff ◽

Richard J. Harrison

Keyword(s):

Large Datasets ◽

Supplementary Information ◽

Supplementary Data ◽

Link Type ◽

Mapping Software ◽

Outcrossing Species ◽

Supplementary Material ◽

Novel Approaches ◽

Similar Accuracy ◽

General Public License

AbstractSummaryCrosslink is genetic mapping software for outcrossing species designed to run efficiently on large datasets by combining the best from existing tools with novel approaches. Tests show it runs much faster than several comparable programs whilst retaining a similar accuracy.Availability and implementationAvailable under the GNU General Public License version 2 from https://github.com/eastmallingresearch/[email protected] informationSupplementary data are available at Bioinformatics online and from https://github.com/eastmallingresearch/crosslink/releases/tag/v0.5.

Download Full-text

emeraLD: Rapid Linkage Disequilibrium Estimation with Massive Data Sets

10.1101/301366 ◽

2018 ◽

Cited By ~ 1

Author(s):

Corbin Quick ◽

Christian Fuchsberger ◽

Daniel Taliun ◽

Gonçalo Abecasis ◽

Michael Boehnke ◽

...

Keyword(s):

Linkage Disequilibrium ◽

Association Studies ◽

Random Access ◽

Supplementary Information ◽

Data Sets ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Genome Wide ◽

Wide Range ◽

Supplementary Material

AbstractSummaryEstimating linkage disequilibrium (LD) is essential for a wide range of summary statistics-based association methods for genome-wide association studies (GWAS). Large genetic data sets, e.g. the TOPMed WGS project and UK Biobank, enable more accurate and comprehensive LD estimates, but increase the computational burden of LD estimation. Here, we describe emeraLD (Efficient Methods for Estimation and Random Access of LD), a computational tool that leverages sparsity and haplotype structure to estimate LD orders of magnitude faster than existing tools.Availability and ImplementationemeraLD is implemented in C++, and is open source under GPLv3. Source code, documentation, an R interface, and utilities for analysis of summary statistics are freely available at http://github.com/statgen/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

sangeranalyseR: simple and interactive analysis of Sanger sequencing data in R

10.1101/2020.05.18.102459 ◽

2020 ◽

Author(s):

Kuan-Hao Chao ◽

Kirston Barton ◽

Sarah Palmer ◽

Robert Lanfear

Keyword(s):

Sanger Sequencing ◽

Reference Sequence ◽

Supplementary Information ◽

File Format ◽

Bioconductor Package ◽

Sequencing Data ◽

Interactive Analysis ◽

Link Type ◽

Online Documentation ◽

Wide Range

AbstractSummarysangeranalyseR is an interactive R/Bioconductor package and two associated Shiny applications designed for analysing Sanger sequencing from data from the ABIF file format in R. It allows users to go from loading reads to saving aligned contigs in a few lines of R code. sangeranalyseR provides a wide range of options for a number of commonly-performed actions including read trimming, detecting secondary peaks, viewing chromatograms, and detecting indels using a reference sequence. All parameters can be adjusted interactively either in R or in the associated Shiny applications. sangeranalyseR comes with extensive online documentation, and outputs detailed interactive HTML reports.Availability and implementationsangeranalyseR is implemented in R and released under an MIT license. It is available for all platforms on Bioconductor (https://bioconductor.org/packages/sangeranalyseR) and on Github (https://github.com/roblanf/sangeranalyseR)[email protected] informationDocumentation at https://sangeranalyser.readthedocs.io/.

Download Full-text