scholarly journals PyRanges: efficient comparison of genomic intervals in Python

2019 ◽  
Author(s):  
Endre Bakken Stovner ◽  
Pål Sætrom

AbstractSummaryComplex genomic analyses often use sequences of simple set operations like intersection, overlap, and nearest on genomic intervals. These operations, coupled with some custom programming, allow a wide range of analyses to be performed. To this end, we have written PyRanges, a data structure for representing and manipulating genomic intervals and their associated data in Python. Run single-threaded on binary set operations, PyRanges is in median 2.3-9.6 times faster than the popular R GenomicRanges library and is equally memory efficient; run multi-threaded on 8 cores, our library is up to 123 times faster. PyRanges is therefore ideally suited both for individual analyses and as a foundation for future genomic libraries in Python.AvailabilityPyRanges is available open-source under the MIT license at https://github.com/biocore-NTNU/pyranges and documentation exists at https://biocore-NTNU.github.io/pyranges/[email protected] informationSupplementary data are available.

Author(s):  
Endre Bakken Stovner ◽  
Pål Sætrom

Abstract Summary Complex genomic analyses often use sequences of simple set operations like intersection, overlap and nearest on genomic intervals. These operations, coupled with some custom programming, allow a wide range of analyses to be performed. To this end, we have written PyRanges, a data structure for representing and manipulating genomic intervals and their associated data in Python. Run single threaded on binary set operations, PyRanges is in median 2.3–9.6 times faster than the popular R GenomicRanges library and is equally memory efficient; run multi-threaded on 8 cores, our library is up to 123 times faster. PyRanges is therefore ideally suited both for individual analyses and as a foundation for future genomic libraries in Python. Availability and implementation PyRanges is available as open source under the MIT license at https://github.com/biocore-NTNU/pyranges and the documentation exists at https://biocore-NTNU.github.io/pyranges/ Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Nima Mousavi ◽  
Jonathan Margoliash ◽  
Neha Pusarla ◽  
Shubham Saini ◽  
Richard Yanicky ◽  
...  

AbstractSummaryA rich set of tools have recently been developed for performing genome-wide genotyping of tandem repeats (TRs). However, standardized tools for downstream analysis of these results are lacking. To facilitate TR analysis applications, we present TRTools, a Python library and a suite of command-line tools for filtering, merging, and quality control of TR genotype files. TRTools utilizes an internal harmonization module making it compatible with outputs from a wide range of TR genotypers.AvailabilityTRTools is freely available at https://github.com/gymreklab/[email protected] informationSupplementary data are available at bioRxiv.


2018 ◽  
Author(s):  
Brent S. Pedersen ◽  
Aaron R. Quinlan

AbstractMotivationExtracting biological insight from genomic data inevitably requires custom software. In many cases, this is accomplished with scripting languages, owing to their accessibility and brevity. Unfortunately, the ease of scripting languages typically comes at a substantial performance cost that is especially acute with the scale of modern genomics datasets.ResultsWe present hts-nim, a high-performance library written in the Nim programming language that provides a simple, scripting-like syntax without sacrificing performance.Availabilityhts-nim is available at https://github.com/brentp/hts-nim and the example tools are at https://github.com/brentp/hts-nim-tools both under the MIT [email protected] informationSupplementary data are available at Bioinformatics online.


Author(s):  
Richard Jiang ◽  
Bruno Jacob ◽  
Matthew Geiger ◽  
Sean Matthew ◽  
Bryan Rumsey ◽  
...  

Abstract Summary We present StochSS Live!, a web-based service for modeling, simulation and analysis of a wide range of mathematical, biological and biochemical systems. Using an epidemiological model of COVID-19, we demonstrate the power of StochSS Live! to enable researchers to quickly develop a deterministic or a discrete stochastic model, infer its parameters and analyze the results. Availability and implementation StochSS Live! is freely available at https://live.stochss.org/ Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Kai Cheng ◽  
Gabrielle Pawlowski ◽  
Xinheng Yu ◽  
Yusen Zhou ◽  
Sriram Neelamegham

Abstract Summary This manuscript describes an open-source program, DrawGlycan-SNFG (version 2), that accepts IUPAC (International Union of Pure and Applied Chemist)-condensed inputs to render Symbol Nomenclature For Glycans (SNFG) drawings. A wide range of local and global options enable display of various glycan/peptide modifications including bond breakages, adducts, repeat structures, ambiguous identifications etc. These facilities make DrawGlycan-SNFG ideal for integration into various glycoinformatics software, including glycomics and glycoproteomics mass spectrometry (MS) applications. As a demonstration of such usage, we incorporated DrawGlycan-SNFG into gpAnnotate, a standalone application to score and annotate individual MS/MS glycopeptide spectrum in different fragmentation modes. Availability and implementation DrawGlycan-SNFG and gpAnnotate are platform independent. While originally coded using MATLAB, compiled packages are also provided to enable DrawGlycan-SNFG implementation in Python and Java. All programs are available from https://virtualglycome.org/drawglycan; https://virtualglycome.org/gpAnnotate. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 36 (7) ◽  
pp. 2280-2281 ◽  
Author(s):  
Sarah Lutteropp ◽  
Alexey M Kozlov ◽  
Alexandros Stamatakis

Abstract Motivation Recently, Lemoine et al. suggested the transfer bootstrap expectation (TBE) branch support metric as an alternative to classical phylogenetic bootstrap support for taxon-rich datasets. However, the original TBE implementation in the booster tool is compute- and memory-intensive. Results We developed a fast and memory-efficient TBE implementation. We improve upon the original algorithm by Lemoine et al. via several algorithmic and technical optimizations. On empirical as well as on random tree sets with varying taxon counts, our implementation is up to 480 times faster than booster. Furthermore, it only requires memory that is linear in the number of taxa, which leads to 10× to 40× memory savings compared with booster. Availability and implementation Our implementation has been partially integrated into pll-modules and RAxML-NG and is available under the GNU Affero General Public License v3.0 at https://github.com/ddarriba/pll-modules and https://github.com/amkozlov/raxml-ng. The parallel version that also computes additional TBE-related statistics is available at: https://github.com/lutteropp/raxml-ng/tree/tbe. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Pierre Morisse ◽  
Claire Lemaitre ◽  
Fabrice Legeai

Abstract Motivation Linked-Reads technologies combine both the high-quality and low cost of short-reads sequencing and long-range information, through the use of barcodes tagging reads which originate from a common long DNA molecule. This technology has been employed in a broad range of applications including genome assembly, phasing and scaffolding, as well as structural variant calling. However, to date, no tool or API dedicated to the manipulation of Linked-Reads data exist. Results We introduce LRez, a C ++ API and toolkit which allows easy management of Linked-Reads data. LRez includes various functionalities, for computing numbers of common barcodes between genomic regions, extracting barcodes from BAM files, as well as indexing and querying BAM, FASTQ and gzipped FASTQ files to quickly fetch all reads or alignments containing a given barcode. LRez is compatible with a wide range of Linked-Reads sequencing technologies, and can thus be used in any tool or pipeline requiring barcode processing or indexing, in order to improve their performances. Availability and implementation LRez is implemented in C ++, supported on Unix-based platforms, and available under AGPL-3.0 License at https://github.com/morispi/LRez, and as a bioconda module. Supplementary information Supplementary data are available at Bioinformatics Advances


2017 ◽  
Author(s):  
Robert J. Vickerstaff ◽  
Richard J. Harrison

AbstractSummaryCrosslink is genetic mapping software for outcrossing species designed to run efficiently on large datasets by combining the best from existing tools with novel approaches. Tests show it runs much faster than several comparable programs whilst retaining a similar accuracy.Availability and implementationAvailable under the GNU General Public License version 2 from https://github.com/eastmallingresearch/[email protected] informationSupplementary data are available at Bioinformatics online and from https://github.com/eastmallingresearch/crosslink/releases/tag/v0.5.


2018 ◽  
Author(s):  
John A Lees ◽  
Marco Galardini ◽  
Stephen D Bentley ◽  
Jeffrey N Weiser ◽  
Jukka Corander

AbstractSummaryGenome-wide association studies (GWAS) in microbes face different challenges to eukaryotes and have been addressed by a number of different methods. pyseer brings these techniques together in one package tailored to microbial GWAS, allows greater flexibility of the input data used, and adds new methods to interpret the association results.Availability and Implementationpyseer is written in python and is freely available at https://github.com/mgalardini/pyseer, or can be installed through pip. Documentation and a tutorial are available at http://[email protected] and [email protected] informationSupplementary data are available online.


2020 ◽  
Author(s):  
Kuan-Hao Chao ◽  
Kirston Barton ◽  
Sarah Palmer ◽  
Robert Lanfear

AbstractSummarysangeranalyseR is an interactive R/Bioconductor package and two associated Shiny applications designed for analysing Sanger sequencing from data from the ABIF file format in R. It allows users to go from loading reads to saving aligned contigs in a few lines of R code. sangeranalyseR provides a wide range of options for a number of commonly-performed actions including read trimming, detecting secondary peaks, viewing chromatograms, and detecting indels using a reference sequence. All parameters can be adjusted interactively either in R or in the associated Shiny applications. sangeranalyseR comes with extensive online documentation, and outputs detailed interactive HTML reports.Availability and implementationsangeranalyseR is implemented in R and released under an MIT license. It is available for all platforms on Bioconductor (https://bioconductor.org/packages/sangeranalyseR) and on Github (https://github.com/roblanf/sangeranalyseR)[email protected] informationDocumentation at https://sangeranalyser.readthedocs.io/.


Sign in / Sign up

Export Citation Format

Share Document