AnyPyTools: A Python package for reproducible research with the AnyBody Modeling System

Morten Lund; John Rasmussen; Michael Andersen

doi:10.21105/joss.01108

Kronos: a workflow assembler for genome analytics and informatics

10.1101/040352 ◽

2016 ◽

Cited By ~ 3

Author(s):

M Jafar Taghiyar ◽

Jamie Rosner ◽

Diljot Grewal ◽

Bruno Grande ◽

Radhouane Aniba ◽

...

Keyword(s):

Feature Detection ◽

Automated Analysis ◽

Configuration File ◽

Detection Methods ◽

Reproducible Research ◽

Robust Implementation ◽

Free Open Source ◽

Standard Framework ◽

Python Package ◽

Generation Sequencing

The field of next generation sequencing informatics has matured to a point where algorithmic advances in sequence alignment and individual feature detection methods have stabilized. Practical and robust implementation of complex analytical workflows (where such tools are structured into "best practices" for automated analysis of NGS datasets) still requires significant programming investment and expertise. We present Kronos, a software platform for automating the development and execution of reproducible, auditable and distributable bioinformatics workflows. Kronos obviates the need for explicit coding of workflows by compiling a text configuration file into executable Python applications. The framework of each workflow includes a run manager to execute the encoded workflows locally (or on a cluster or cloud), parallelize tasks, and log all runtime events. Resulting workflows are highly modular and configurable by construction, facilitating flexible and extensible meta-applications which can be modified easily through configuration file editing. The workflows are fully encoded for ease of distribution and can be instantiated on external systems, promoting and facilitating reproducible research and comparative analyses. We introduce a framework for building Kronos components which function as shareable, modular nodes in Kronos workflows. The Kronos platform provides a standard framework for developers to implement custom tools, reuse existing tools, and contribute to the community at large. Kronos is shipped with both Docker and Amazon AWS machine images. It is free, open source and available through PyPI (Python Package Index) and https://github.com/jtaghiyar/kronos. Keywords: genomics; workflow; pipeline; reproducibility

Download Full-text

ppx: Programmatic access to proteomics data repositories

10.1101/2021.05.29.446304 ◽

2021 ◽

Author(s):

William E Fondrie ◽

Wout Bittremieux ◽

William S Noble

Keyword(s):

Mass Spectrometry ◽

Open Science ◽

Mass Spectrometry Data ◽

Reproducible Research ◽

Easy Access ◽

Proteomics Data ◽

Data Repositories ◽

Access To Data ◽

Python Package ◽

Programmatic Access

The volume of proteomics and mass spectrometry data available in public repositories continues to grow at a rapid pace as more researchers embrace open science practices. Open access to the data behind scientific discoveries has become critical to validate published findings and develop new computational tools. Here, we present ppx, a Python package that provides easy, programmatic access to the data stored in ProteomeXchange repositories, such as PRIDE and MassIVE. The ppx package can either be used as a command line tool or a Python package to retrieve the files and metadata associated with a project when provided its identifier. To demonstrate how ppx enhances reproducible research, we used ppx within a Snakemake workflow to reanalyze a published dataset with the open modification search tool ANN-SoLo and compared our reanalysis to the original results. We show that ppx readily integrates into workflows and our reanalysis produced results consistent with the original analysis. We envision that ppx will be a valuable tool for creating reproducible analyses, providing tool developers easy access to data for development, testing, and benchmarking, and enabling the use of mass spectrometry data in data-intensive analyses. The ppx package is freely available and open source under the MIT license at: https://github.com/wfondrie/ppx

Download Full-text

The Practice of Reproducible Research

10.1525/9780520967779 ◽

2017 ◽

Cited By ~ 11

Keyword(s):

Reproducible Research

Download Full-text

pymia: A Python package for data handling and evaluation in deep learning-based medical image analysis

Computer Methods and Programs in Biomedicine ◽

10.1016/j.cmpb.2020.105796 ◽

2021 ◽

Vol 198 ◽

pp. 105796

Author(s):

Alain Jungo ◽

Olivier Scheidegger ◽

Mauricio Reyes ◽

Fabian Balsiger

Keyword(s):

Image Analysis ◽

Deep Learning ◽

Medical Image ◽

Medical Image Analysis ◽

Data Handling ◽

Python Package

Download Full-text

GriSPy: A Python package for fixed-radius nearest neighbors search

Astronomy and Computing ◽

10.1016/j.ascom.2020.100443 ◽

2020 ◽

pp. 100443

Author(s):

M. Chalela ◽

E. Sillero ◽

L. Pereyra ◽

M.A. Garcia ◽

J.B. Cabral ◽

...

Keyword(s):

Nearest Neighbors ◽

Fixed Radius ◽

Python Package

Download Full-text

A Beginner's Guide to Conducting Reproducible Research

Bulletin of the Ecological Society of America ◽

10.1002/bes2.1801 ◽

2021 ◽

Author(s):

Jesse M. Alston ◽

Jessica A. Rick

Keyword(s):

Reproducible Research

Download Full-text

BiSulfite Bolt: A bisulfite sequencing analysis platform

GigaScience ◽

10.1093/gigascience/giab033 ◽

2021 ◽

Vol 10 (5) ◽

Author(s):

Colin Farrell ◽

Michael Thompson ◽

Anela Tosevska ◽

Adewale Oyetunde ◽

Matteo Pellegrini

Keyword(s):

Data Aggregation ◽

Bisulfite Sequencing ◽

Low Complexity ◽

Sequencing Analysis ◽

Command Line ◽

Sequencing Data ◽

Bisulfite Sequencing Data ◽

Analysis Platform ◽

Python Package ◽

Bisulfite Sequencing Analysis

Abstract Background Bisulfite sequencing is commonly used to measure DNA methylation. Processing bisulfite sequencing data is often challenging owing to the computational demands of mapping a low-complexity, asymmetrical library and the lack of a unified processing toolset to produce an analysis-ready methylation matrix from read alignments. To address these shortcomings, we have developed BiSulfite Bolt (BSBolt), a fast and scalable bisulfite sequencing analysis platform. BSBolt performs a pre-alignment sequencing read assessment step to improve efficiency when handling asymmetrical bisulfite sequencing libraries. Findings We evaluated BSBolt against simulated and real bisulfite sequencing libraries. We found that BSBolt provides accurate and fast bisulfite sequencing alignments and methylation calls. We also compared BSBolt to several existing bisulfite alignment tools and found BSBolt outperforms Bismark, BSSeeker2, BISCUIT, and BWA-Meth based on alignment accuracy and methylation calling accuracy. Conclusion BSBolt offers streamlined processing of bisulfite sequencing data through an integrated toolset that offers support for simulation, alignment, methylation calling, and data aggregation. BSBolt is implemented as a Python package and command line utility for flexibility when building informatics pipelines. BSBolt is available at https://github.com/NuttyLogic/BSBolt under an MIT license.

Download Full-text

SynBiopython: an open-source software library for Synthetic Biology

Synthetic Biology ◽

10.1093/synbio/ysab001 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Jing Wui Yeoh ◽

Neil Swainston ◽

Peter Vegh ◽

Valentin Zulkower ◽

Pablo Carbonell ◽

...

Keyword(s):

Synthetic Biology ◽

Open Source ◽

Open Source Software ◽

Development Projects ◽

Software Library ◽

Current State ◽

Starting Point ◽

Common Problems ◽

Data Tracking ◽

Python Package

Abstract Advances in hardware automation in synthetic biology laboratories are not yet fully matched by those of their software counterparts. Such automated laboratories, now commonly called biofoundries, require software solutions that would help with many specialized tasks such as batch DNA design, sample and data tracking, and data analysis, among others. Typically, many of the challenges facing biofoundries are shared, yet there is frequent wheel-reinvention where many labs develop similar software solutions in parallel. In this article, we present the first attempt at creating a standardized, open-source Python package. A number of tools will be integrated and developed that we envisage will become the obvious starting point for software development projects within biofoundries globally. Specifically, we describe the current state of available software, present usage scenarios and case studies for common problems, and finally describe plans for future development. SynBiopython is publicly available at the following address: http://synbiopython.org.

Download Full-text

TAILOR-MS, a Python Package that Deciphers Complex Triacylglycerol Fatty Acyl Structures: Applications for Bovine Milk and Infant Formulas

Analytical Chemistry ◽

10.1021/acs.analchem.0c04373 ◽

2021 ◽

Author(s):

Kang-Yu Peng ◽

Malinda Salim ◽

Joseph Pelle ◽

Gisela Ramirez ◽

Ben J. Boyd

Keyword(s):

Bovine Milk ◽

Fatty Acyl ◽

Infant Formulas ◽

Python Package

Download Full-text

A flexible framework for anomaly Detection via dimensionality reduction

Neural Computing and Applications ◽

10.1007/s00521-021-05839-5 ◽

2021 ◽

Author(s):

Alireza Vafaei Sadr ◽

Bruce A. Bassett ◽

M. Kunz

Keyword(s):

Anomaly Detection ◽

Dimensionality Reduction ◽

Dimensional Space ◽

High Dimensions ◽

Detection Algorithms ◽

Latent Space ◽

Wide Range ◽

Flexible Framework ◽

Online Anomaly Detection ◽

Python Package

AbstractAnomaly detection is challenging, especially for large datasets in high dimensions. Here, we explore a general anomaly detection framework based on dimensionality reduction and unsupervised clustering. DRAMA is released as a general python package that implements the general framework with a wide range of built-in options. This approach identifies the primary prototypes in the data with anomalies detected by their large distances from the prototypes, either in the latent space or in the original, high-dimensional space. DRAMA is tested on a wide variety of simulated and real datasets, in up to 3000 dimensions, and is found to be robust and highly competitive with commonly used anomaly detection algorithms, especially in high dimensions. The flexibility of the DRAMA framework allows for significant optimization once some examples of anomalies are available, making it ideal for online anomaly detection, active learning, and highly unbalanced datasets. Besides, DRAMA naturally provides clustering of outliers for subsequent analysis.

Download Full-text