smallrnaseq: short non coding RNA-Seq analysis with Python

Mapping Intimacies ◽

10.1101/110585 ◽

2017 ◽

Cited By ~ 1

Author(s):

Damien Farrell

Keyword(s):

Novel Species ◽

Command Line ◽

Rna Seq ◽

Absolute Abundance ◽

Command Line Interface ◽

Novel Mirna ◽

Non Coding Rna ◽

Regulatory Functions ◽

Python Package ◽

Generation Sequencing

ABSTRACTThe use of next generation sequencing is now a standard approach to elucidate the small non-coding RNA species (sncRNAs) present in tissue and biofluid samples. This has revealed the wide variety of RNAs with regulatory functions the best studied of which are microRNAs. Profiling of sncRNAs by deep sequencing allows measures of absolute abundance and for the discovery of novel species that have eluded previous methods. Specific considerations must be made when quantifying and cataloging sncRNAs and multiple algorithms are now available, mostly focused on miRNA analysis. smallrnaseq is a Python package that implements some of the standard approaches for quantification and analysis of sncRNAs. This includes miRNA quantification and novel miRNA prediction. A command line interface makes the software accessible for general users.

Download Full-text

CoCo: RNA-seq read assignment correction for nested genes and multimapped reads

Bioinformatics ◽

10.1093/bioinformatics/btz433 ◽

2019 ◽

Vol 35 (23) ◽

pp. 5039-5047 ◽

Cited By ~ 6

Author(s):

Gabrielle Deschamps-Francoeur ◽

Vincent Boivin ◽

Sherif Abou Elela ◽

Michelle S Scott

Keyword(s):

Supplementary Information ◽

Rna Seq ◽

Non Coding Rna ◽

Abundance Estimates ◽

Gene Coverage ◽

Nested Genes ◽

Quantification Accuracy ◽

Whole Transcriptome Analysis ◽

Whole Transcriptome ◽

Generation Sequencing

Abstract Motivation Next-generation sequencing techniques revolutionized the study of RNA expression by permitting whole transcriptome analysis. However, sequencing reads generated from nested and multi-copy genes are often either misassigned or discarded, which greatly reduces both quantification accuracy and gene coverage. Results Here we present count corrector (CoCo), a read assignment pipeline that takes into account the multitude of overlapping and repetitive genes in the transcriptome of higher eukaryotes. CoCo uses a modified annotation file that highlights nested genes and proportionally distributes multimapped reads between repeated sequences. CoCo salvages over 15% of discarded aligned RNA-seq reads and significantly changes the abundance estimates for both coding and non-coding RNA as validated by PCR and bedgraph comparisons. Availability and implementation The CoCo software is an open source package written in Python and available from http://gitlabscottgroup.med.usherbrooke.ca/scott-group/coco. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Methods for analyzing next-generation sequencing data II. From graphical user interface to command line interface

Japanese Journal of Lactic Acid Bacteria ◽

10.4109/jslab.25.166 ◽

2014 ◽

Vol 25 (3) ◽

pp. 166-174

Author(s):

Jianqiang Sun ◽

Min Tang ◽

Tasuku Nishioka ◽

Kentaro Shimizu ◽

Koji Kadota

Keyword(s):

User Interface ◽

Next Generation Sequencing ◽

Graphical User Interface ◽

Next Generation Sequencing Data ◽

Command Line ◽

Next Generation ◽

Sequencing Data ◽

Command Line Interface ◽

Generation Sequencing

Download Full-text

pysradb: A Python package to query next-generation sequencing metadata and data from NCBI Sequence Read Archive

F1000Research ◽

10.12688/f1000research.18676.1 ◽

2019 ◽

Vol 8 ◽

pp. 532 ◽

Cited By ~ 2

Author(s):

Saket Choudhary

Keyword(s):

Next Generation Sequencing ◽

Research Community ◽

Command Line ◽

Next Generation ◽

Multiple Use ◽

Sequencing Data ◽

Sequence Read Archive ◽

Python Package ◽

Generation Sequencing ◽

Ncbi Sequence Read Archive

The NCBI Sequence Read Archive (SRA) is the primary archive of next-generation sequencing datasets. SRA makes metadata and raw sequencing data available to the research community to encourage reproducibility and to provide avenues for testing novel hypotheses on publicly available data. However, methods to programmatically access this data are limited. We introduce the Python package, pysradb, which provides a collection of command line methods to query and download metadata and data from SRA, utilizing the curated metadata database available through the SRAdb project. We demonstrate the utility of pysradb on multiple use cases for searching and downloading SRA datasets. It is available freely at https://github.com/saketkc/pysradb.

Download Full-text

READemption - A tool for the computational analysis of deep-sequencing-based transcriptome data

10.1101/003723 ◽

2014 ◽

Cited By ~ 5

Author(s):

Konrad Ulrich Förstner ◽

Jörg Vogel ◽

Cynthia Mira Sharma

Keyword(s):

Data Processing ◽

Deep Sequencing ◽

Computational Analysis ◽

Command Line ◽

Transcriptome Data ◽

Rna Seq ◽

Command Line Interface ◽

Parallel Data ◽

Full Power ◽

Computationally Intensive

Summary: RNA-Seq has become a potent and widely used method to qualitatively and quantitatively study transcriptomes. In order to draw biological conclusions based on RNA-Seq data, several steps some of which are computationally intensive, have to betaken. Our READemption pipeline takes care of these individual tasks and integrates them into an easy-to-use tool with a command line interface. To leverage the full power of modern computers, most subcommands of READemption offer parallel data processing. While READemption was mainly developed for the analysis of bacterial primary transcriptomes, we have successfully applied it to analyze RNA-Seq reads from other sample types, including whole transcriptomes, RNA immunoprecipitated with proteins, not only from bacteria, but also from eukaryotes and archaea. Availability and Implementation: READemption is implemented in Python and is published under the ISC open source license. The tool and documentation is hosted at http://pythonhosted.org/READemption (DOI:10.6084/m9.figshare.977849).

Download Full-text

amplimap: a versatile tool to process and analyze targeted NGS data

Bioinformatics ◽

10.1093/bioinformatics/btz582 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5349-5350

Author(s):

Nils Koelling ◽

Marie Bernkopf ◽

Eduardo Calpena ◽

Geoffrey J Maher ◽

Kerry A Miller ◽

...

Keyword(s):

Command Line ◽

User Friendliness ◽

Targeted Next Generation Sequencing ◽

Base Calling ◽

Targeted Ngs ◽

Command Line Tool ◽

Versatile Tool ◽

Ngs Data ◽

Python Package ◽

Generation Sequencing

Abstract Summary amplimap is a command-line tool to automate the processing and analysis of data from targeted next-generation sequencing experiments with PCR-based amplicons or capture-based enrichment systems. From raw sequencing reads, amplimap generates output such as read alignments, annotated variant calls, target coverage statistics and variant allele counts and frequencies for each target base pair. In addition to its focus on user-friendliness and reproducibility, amplimap supports advanced features such as consensus base calling for read families based on unique molecular identifiers and filtering false positive variant calls caused by amplification of off-target loci. Availability and implementation amplimap is available as a free Python package under the open-source Apache 2.0 License. Documentation, source code and installation instructions are available at https://github.com/koelling/amplimap.

Download Full-text

iREAD: A Tool for Intron Retention Detection from RNA-seq Data

10.1101/135624 ◽

2017 ◽

Cited By ~ 1

Author(s):

Hong-Dong Li ◽

Cory C. Funk ◽

Nathan D. Price

Keyword(s):

Intron Retention ◽

Distribution Patterns ◽

Functional Enrichment ◽

Command Line ◽

Rna Seq ◽

Text File ◽

Command Line Interface ◽

Genome Wide ◽

A Genome ◽

Retained Introns

AbstractSummaryDetecting intron retention (IR) events is emerging as a specialized need for RNA-seq data analysis. Here we present iREAD (intron REtention Analysis and Detector), a tool to detect IR events genome-wide from high-throughput RNA-seq data. The command line interface for iREAD is implemented in Python. iREAD takes as input an existing BAM file, representing the transcriptome, and a text file containing the intron coordinates of a genome. It then 1) counts all reads that overlap intron regions, 2) detects IR vents by analyzing features of reads such as depth and distribution patterns, and 3) outputs a list of retained introns into a tab-delimited text file. The output can be directly used for further exploratory analysis such as differential intron expression and functional enrichment. iREAD provides a new and generic tool to interrogate poly-A enriched transcriptomic data of intron regions.Availabilitywww.libpls.net/[email protected]

Download Full-text

CoCo: RNA-seq Read Assignment Correction for Nested Genes and Multimapped Reads

10.1101/477869 ◽

2018 ◽

Cited By ~ 1

Author(s):

Gabrielle Deschamps-Francoeur ◽

Vincent Boivin ◽

Sherif Abou Elela ◽

Michelle S Scott

Keyword(s):

Rna Seq ◽

Non Coding Rna ◽

Abundance Estimates ◽

Gene Coverage ◽

Nested Genes ◽

Quantification Accuracy ◽

Higher Eukaryotes ◽

Whole Transcriptome Analysis ◽

Whole Transcriptome ◽

Generation Sequencing

AbstractMotivationNext generation sequencing techniques revolutionized the study of RNA expression by permitting whole transcriptome analysis. However, sequencing reads generated from nested and multi-copy genes are often either misassigned or discarded, which greatly reduces both quantification accuracy and gene coverage.ResultsHere we present CoCo, a read assignment pipeline that takes into account the multitude of overlapping and repetitive genes in the transcriptome of higher eukaryotes. CoCo uses a modified annotation file that highlights nested genes and proportionally distributes multimapped reads between repeated sequences. CoCo salvages over 15% of discarded aligned RNA-seq reads and significantly changes the abundance estimates for both coding and non-coding RNA as validated by PCR and bed-graph comparisons.AvailabilityThe CoCo software is an open source package written in Python and available from http://gitlabscottgroup.med.usherbrooke.ca/scott-group/[email protected]

Download Full-text

pyBedGraph: a python package for fast operations on 1D genomic signal tracks

Bioinformatics ◽

10.1093/bioinformatics/btaa061 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3234-3235

Author(s):

Henry B Zhang ◽

Minji Kim ◽

Jeffrey H Chuang ◽

Yijun Ruan

Keyword(s):

Chromatin Accessibility ◽

Genomic Research ◽

Supplementary Information ◽

Summary Statistics ◽

Rna Seq ◽

Binary Format ◽

Modern Genomic ◽

Binding Intensity ◽

Python Package ◽

Generation Sequencing

Abstract Motivation Modern genomic research is driven by next-generation sequencing experiments such as ChIP-seq and ChIA-PET that generate coverage files for transcription factor binding, as well as DHS and ATAC-seq that yield coverage files for chromatin accessibility. Such files are in a bedGraph text format or a bigWig binary format. Obtaining summary statistics in a given region is a fundamental task in analyzing protein binding intensity or chromatin accessibility. However, the existing Python package for operating on coverage files is not optimized for speed. Results We developed pyBedGraph, a Python package to quickly obtain summary statistics for a given interval in a bedGraph or a bigWig file. When tested on 12 ChIP-seq, ATAC-seq, RNA-seq and ChIA-PET datasets, pyBedGraph is on average 260 times faster than the existing program pyBigWig. On average, pyBedGraph can look up the exact mean signal of 1 million regions in ∼0.26 s and can compute their approximate means in <0.12 s on a conventional laptop. Availability and implementation pyBedGraph is publicly available at https://github.com/TheJacksonLaboratory/pyBedGraph under the MIT license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

pysradb: A Python package to query next-generation sequencing metadata and data from NCBI Sequence Read Archive

10.1101/578500 ◽

2019 ◽

Author(s):

Saket Choudhary

Keyword(s):

Next Generation Sequencing ◽

Research Community ◽

Command Line ◽

Next Generation ◽

Multiple Use ◽

Sequencing Data ◽

Sequence Read Archive ◽

Python Package ◽

Generation Sequencing ◽

Ncbi Sequence Read Archive

AbstractNCBIs Sequence Read Archive (SRA) is the primary archive of next-generation sequencing datasets. SRA makes metadata and raw sequencing data available to the research community to encourage reproducibility, and to provide avenues for testing novel hypotheses on publicly available data. However, existing methods to programmatically access these data are limited. We introduce a Python packagepysradbthat provides a collection of command line methods to query and download metadata and data from SRA utilizing the curated metadata database available through the SRAdb project. We demonstrate the utility ofpysradbon multiple use cases for searching and downloading SRA datasets. It is available freely athttps://github.com/saketkc/pysradb.

Download Full-text

DIANA-mAP: Analyzing miRNA from Raw NGS Data to Quantification

Genes ◽

10.3390/genes12010046 ◽

2020 ◽

Vol 12 (1) ◽

pp. 46

Author(s):

Athanasios Alexiou ◽

Dimitrios Zisis ◽

Ioannis Kavakiotis ◽

Marios Miliotis ◽

Antonis Koussounadis ◽

...

Keyword(s):

Expression Profiles ◽

Differential Expression Analysis ◽

Rna Seq ◽

Rna Quantification ◽

Non Coding Rna ◽

Data Production ◽

Ngs Data Analysis ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

microRNAs (miRNAs) are small non-coding RNAs (~22 nts) that are considered central post-transcriptional regulators of gene expression and key components in many pathological conditions. Next-Generation Sequencing (NGS) technologies have led to inexpensive, massive data production, revolutionizing every research aspect in the fields of biology and medicine. Particularly, small RNA-Seq (sRNA-Seq) enables small non-coding RNA quantification on a high-throughput scale, providing a closer look into the expression profiles of these crucial regulators within the cell. Here, we present DIANA-microRNA-Analysis-Pipeline (DIANA-mAP), a fully automated computational pipeline that allows the user to perform miRNA NGS data analysis from raw sRNA-Seq libraries to quantification and Differential Expression Analysis in an easy, scalable, efficient, and intuitive way. Emphasis has been given to data pre-processing, an early, critical step in the analysis for the robustness of the final results and conclusions. Through modularity, parallelizability and customization, DIANA-mAP produces high quality expression results, reports and graphs for downstream data mining and statistical analysis. In an extended evaluation, the tool outperforms similar tools providing pre-processing without any adapter knowledge. Closing, DIANA-mAP is a freely available tool. It is available dockerized with no dependency installations or standalone, accompanied by an installation manual through Github.

Download Full-text