INsPeCT: INtegrative Platform for Cancer Transcriptomics

Cancer Informatics ◽

10.4137/cin.s13630 ◽

2014 ◽

Vol 13 ◽

pp. CIN.S13630 ◽

Cited By ~ 3

Author(s):

Piyush B. Madhamshettiwar ◽

Stefan R. Maetschke ◽

Melissa J. Davis ◽

Antonio Reverter ◽

Mark A. Ragan

Keyword(s):

Cancer Biology ◽

Network Inference ◽

High Throughput Sequencing ◽

Analytical Framework ◽

Data Infrastructure ◽

Sequencing Technologies ◽

Regulatory Module ◽

Highly Correlated ◽

User Friendly ◽

Analytical Approaches

The emergence of transcriptomics, fuelled by high-throughput sequencing technologies, has changed the nature of cancer research and resulted in a massive accumulation of data. Computational analysis, integration, and data visualization are now major bottlenecks in cancer biology and translational research. Although many tools have been brought to bear on these problems, their use remains unnecessarily restricted to computational biologists, as many tools require scripting skills, data infrastructure, and powerful computational facilities. New user-friendly, integrative, and automated analytical approaches are required to make computational methods more generally useful to the research community. Here we present INsPeCT (INtegrative Platform for Cancer Transcriptomics), which allows users with basic computer skills to perform comprehensive in-silico analyses of microarray, ChlPseq, and RNA-seq data. INsPeCT supports the selection of interesting genes for advanced functional analysis. Included in its automated workflows are (i) a novel analytical framework, RMaNI (regulatory module network inference), which supports the inference of cancer subtype-specific transcriptional module networks and the analysis of modules; and (ii) WGCNA (weighted gene co-expression network analysis), which infers modules of highly correlated genes across microarray samples, associated with sample traits, eg survival time. INsPeCT is available free of cost from Bioinformatics Resource Australia-EMBL and can be accessed at http://inspect.braembl.org.au .

Download Full-text

Natrix: A Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads

10.1101/2020.09.23.309864 ◽

2020 ◽

Author(s):

Marius Welzel ◽

Anja Lange ◽

Dominik Heider ◽

Michael Schwarz ◽

Bernd Freisleben ◽

...

Keyword(s):

High Throughput Sequencing ◽

Workflow Management ◽

Amplicon Sequencing ◽

Version Control ◽

Marker Genes ◽

Sequencing Data ◽

Taxonomic Assignment ◽

Ecological Processes ◽

Sequencing Technologies ◽

User Friendly

AbstractSequencing of marker genes amplified from environmental samples, known as amplicon sequencing, allows us to resolve some of the hidden diversity and elucidate evolutionary relationships and ecological processes among complex microbial communities. The analysis of large numbers of samples at high sequencing depths generated by high throughput sequencing technologies requires effcient, flexible, and reproducible bioinformatics pipelines. Only a few existing workflows can be run in a user-friendly, scalable, and reproducible manner on different computing devices using an effcient workflow management system. We present Natrix, an open-source bioinformatics workflow for preprocessing raw amplicon sequencing data. The workflow contains all analysis steps from quality assessment, read assembly, dereplication, chimera detection, split-sample merging, sequence representative assignment (OTUs or ASVs) to the taxonomic assignment of sequence representatives. The workflow is written using Snakemake, a workflow management engine for developing data analysis workflows. In addition, Conda is used for version control. Thus, Snakemake ensures reproducibility and Conda offers version control of the utilized programs. The encapsulation of rules and their dependencies support hassle-free sharing of rules between workflows and easy adaptation and extension of existing workflows. Natrix is freely available on GitHub (https://github.com/MW55/Natrix).

Download Full-text

PuMA: a papillomavirus genome annotation tool

10.1101/736991 ◽

2019 ◽

Author(s):

J. Pace ◽

K. Youens-Clark ◽

C. Freeman ◽

B. Hurwitz ◽

K. Van Doorslaer

Keyword(s):

Genome Annotation ◽

High Throughput Sequencing ◽

Viral Metagenomics ◽

Annotation Tool ◽

General Applicability ◽

Viral Genomes ◽

Link Type ◽

Sequencing Technologies ◽

Reproducible Method ◽

Analytical Approaches

ABSTRACTHigh-throughput sequencing technologies provide unprecedented power to identify novel viruses from a wide variety of (environmental) samples. The field of ‘viral metagenomics’ has dramatically expanded our understanding of viral diversity. Viral metagenomic approaches imply that many novel viruses will not be described by researchers who are experts on the genomic organization of that virus. There is a need to develop analytical approaches to reconstruct, annotate, and classify viral genomes. We have developed the papillomavirus annotation tool (PuMA) to provide researchers with a convenient and reproducible method to annotate novel papillomaviruses. PuMA provides an accessible method for automated papillomavirus genome annotation. PuMA currently has a 98% accuracy when benchmarked against the 481 reference genomes in the papillomavirus episteme (PaVE). Finally, PuMA was used to annotate 168 newly isolated papillomaviruses, and successfully annotated 1424 viral features. To demonstrate its general applicability, we developed a version of PuMA that can annotate polyomaviruses.PuMA is available on GitHub (https://github.com/KVD-lab/puma) and through the iMicrobe online environment (https://www.imicrobe.us/#/apps/puma)

Download Full-text

PhageTerm: a Fast and User-friendly Software to Determine Bacteriophage Termini and Packaging Mode using randomly fragmented NGS data

10.1101/108100 ◽

2017 ◽

Cited By ~ 2

Author(s):

Julian Garneau ◽

Florence Depardieu ◽

Louis-Charles Fortier ◽

David Bikard ◽

Marc Monot

Keyword(s):

High Throughput Sequencing ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Link Type ◽

Sequencing Technologies ◽

Statistical Framework ◽

Fastq Format ◽

Viral Particles ◽

User Friendly ◽

Ngs Data

ABSTRACTBacteriophages are the most abundant viruses on earth and display an impressive genetic as well as morphologic diversity. Among those, the most common order of phages is the Caudovirales, whose viral particles packages linear double stranded DNA (dsDNA). In this study we investigated how the information gathered by high throughput sequencing technologies can be used to determine the DNA termini and packaging mechanisms of dsDNA phages. The wet-lab procedures traditionally used for this purpose rely on the identification and cloning of restriction fragment which can be delicate and cumbersome. Here, we developed a theoretical and statistical framework to analyze DNA termini and phage packaging mechanisms using next-generation sequencing data. Our methods, implemented in the PhageTerm software, work with sequencing reads in fastq format and the corresponding assembled phage genome.PhageTerm was validated on a set of phages with well-established packaging mechanisms representative of the termini diversity: 5’cos (lambda), 3’cos (HK97), pac (P1), headful without a pac site (T4), DTR (T7) and host fragment (Mu). In addition, we determined the termini of 9Clostridium difficilephages and 6 phages whose sequences where retrieved from the sequence read archive (SRA).A direct graphical interface is available as a Galaxy wrapper version athttps://galaxy.pasteur.frand a standalone version is accessible athttps://sourceforge.net/projects/phageterm/.

Download Full-text

Pipeliner: A Nextflow-based framework for the definition of sequencing data processing pipelines

10.1101/476515 ◽

2018 ◽

Cited By ~ 1

Author(s):

Anthony Federico ◽

Tanya Karagiannis ◽

Kritika Karri ◽

Dileep Kishore ◽

Yusuke Koga ◽

...

Keyword(s):

Data Processing ◽

High Throughput Sequencing ◽

Digital Gene Expression ◽

Rna Seq ◽

Sequencing Data ◽

Sequencing Technologies ◽

Computing Environments ◽

Scripting Language ◽

Definition Of ◽

User Friendly

AbstractThe advent of high-throughput sequencing technologies has led to the need for flexible and user-friendly data pre-processing platforms. The Pipeliner framework provides an out-of-the-box solution for processing various types of sequencing data. It combines the Nextflow scripting language and Anaconda package manager to generate modular computational workflows. We have used Pipeliner to create several pipelines for sequencing data processing including bulk RNA-seq, single-cell RNA-seq (scRNA-seq), as well as Digital Gene Expression (DGE) data. This report highlights the design methodology behind Pipeliner which enables the development of highly flexible and reproducible pipelines that are easy to extend and maintain on multiple computing environments. We also provide a quick start user guide demonstrating how to setup and execute available pipelines with toy datasets.

Download Full-text

metabolisHMM: Phylogenomic analysis for exploration of microbial phylogenies and metabolic pathways

10.1101/2019.12.20.884627 ◽

2019 ◽

Cited By ~ 1

Author(s):

E.A. McDaniel ◽

K. Anantharaman ◽

K.D. McMahon

Keyword(s):

High Throughput Sequencing ◽

Markov Models ◽

Marker Gene ◽

Phylogenomic Analysis ◽

Metagenomic Sequencing ◽

Metabolic Characteristics ◽

Link Type ◽

Sequencing Technologies ◽

Single Marker ◽

User Friendly

AbstractSummaryAdvances in high-throughput sequencing technologies and bioinformatic pipelines have exponentially increased the amount of data that can be obtained from uncultivated microbial lineages inhabiting diverse ecosystems. Various annotation tools and databases currently exist for predicting the functional potential of sequenced genomes or microbial communities based upon sequence identity. However, intuitive, reproducible, and user-friendly tools for further exploring and visualizing functional guilds of microbial community metagenomic sequencing datasets remains lacking. Here, we present metabolisHMM, a series of workflows for visualizing the distribution of curated and user-provided Hidden Markov Models (HMMs) to understand metabolic characteristics and evolutionary histories of microbial lineages. metabolisHMM performs functional annotations with a set of curated or user-defined HMMs to 1) construct ribosomal protein and single marker gene phylogenies, 2) summarize the presence/absence of metabolic pathway markers, and 3) create heatmap visualizations of presence/absence summaries.Availability and ImplementationmetabolisHMM is freely available on Github at https://github.com/elizabethmcd/metabolisHMM and on PyPi at https://pypi.org/project/metabolisHMM/ under the GNU General Public License v3.0.

Download Full-text

ChIPdig: a comprehensive user-friendly tool for mining multi-sample ChIP-seq data

F1000Research ◽

10.12688/f1000research.20027.1 ◽

2019 ◽

Vol 8 ◽

pp. 1295 ◽

Cited By ~ 1

Author(s):

Ruben Esse

Keyword(s):

Data Analysis ◽

High Throughput Sequencing ◽

Enrichment Analysis ◽

Peak Calling ◽

Read Mapping ◽

Sequencing Technologies ◽

Genome Wide ◽

Wet Lab ◽

User Friendly ◽

Epigenetic Research

In recent years, epigenetic research has enjoyed explosive growth as high-throughput sequencing technologies become more accessible and affordable. However, this advancement has not been matched with similar progress in data analysis capabilities from the perspective of experimental biologists not versed in bioinformatic languages. For instance, chromatin immunoprecipitation followed by next-generation sequencing (ChIP-seq) is at present widely used to identify genomic loci of transcription factor binding and histone modifications. Basic ChIP-seq data analysis, including read mapping and peak calling, can be accomplished through several well-established tools, but more sophisticated analyzes aimed at comparing data derived from different conditions or experimental designs constitute a significant bottleneck. We reason that the implementation of a single comprehensive ChIP-seq analysis pipeline could be beneficial for many experimental (wet lab) researchers who would like to generate genomic data. Here we present ChIPdig, a stand-alone application with adjustable parameters designed to allow researchers to perform several analyzes, namely read mapping to a reference genome, peak calling, annotation of regions based on reference coordinates (e.g. transcription start and termination sites, exons, introns, and 5' and 3' untranslated regions), and generation of heatmaps and metaplots for visualizing coverage. Importantly, ChIPdig accepts multiple ChIP-seq datasets as input, allowing genome-wide differential enrichment analysis in regions of interest to be performed. ChIPdig is written in R and enables access to several existing and highly utilized packages through a simple user interface powered by the Shiny package. Here, we illustrate the utility and user-friendly features of ChIPdig by analyzing H3K36me3 and H3K4me3 ChIP-seq profiles generated by the modENCODE project as an example. ChIPdig offers a comprehensive and user-friendly pipeline for analysis of multiple sets of ChIP-seq data by both experimental and computational researchers. It is open source and available at https://github.com/rmesse/ChIPdig.

Download Full-text

MitoFinder: efficient automated large-scale extraction of mitogenomic data in target enrichment phylogenomics

10.1101/685412 ◽

2019 ◽

Cited By ~ 2

Author(s):

Rémi Allio ◽

Alex Schomaker-Bastos ◽

Jonathan Romiguier ◽

Francisco Prosdocimi ◽

Benoit Nabholz ◽

...

Keyword(s):

Large Scale ◽

High Throughput Sequencing ◽

Target Enrichment ◽

Sequencing Technologies ◽

Coi Barcoding ◽

Genomic Markers ◽

Order Of Magnitude ◽

Dna Elements ◽

User Friendly

AbstractThanks to the development of high-throughput sequencing technologies, target enrichment sequencing of nuclear ultraconserved DNA elements (UCEs) now allows routinely inferring phylogenetic relationships from thousands of genomic markers. Recently, it has been shown that mitochondrial DNA (mtDNA) is frequently sequenced alongside the targeted loci in such capture experiments. Despite its broad evolutionary interest, mtDNA is rarely assembled and used in conjunction with nuclear markers in capture-based studies. Here, we developed MitoFinder, a user-friendly bioinformatic pipeline, to efficiently assemble and annotate mitogenomic data from hundreds of UCE libraries. As a case study, we used ants (Formicidae) for which 501 UCE libraries have been sequenced whereas only 29 mitogenomes are available. We compared the efficiency of four different assemblers (IDBA-UD, MEGAHIT, MetaSPAdes, and Trinity) for assembling both UCE and mtDNA loci. Using MitoFinder, we show that metagenomic assemblers, in particular MetaSPAdes, are well suited to assemble both UCEs and mtDNA. Mitogenomic signal was successfully extracted from all 501 UCE libraries allowing confirming species identification using COI barcoding. Moreover, our automated procedure retrieved 296 cases in which the mitochondrial genome was assembled in a single contig, thus increasing the number of available ant mitogenomes by an order of magnitude. By leveraging the power of metagenomic assemblers, MitoFinder provides an efficient tool to extract complementary mitogenomic data from UCE libraries, allowing testing for potential mito-nuclear discordance. Our approach is potentially applicable to other sequence capture methods, transcriptomic data, and whole genome shotgun sequencing in diverse taxa.

Download Full-text

ChIPdig: a comprehensive user-friendly tool for mining multi-sample ChIP-seq data

10.1101/220079 ◽

2017 ◽

Cited By ~ 2

Author(s):

Ruben Esse ◽

Alla Grishok

Keyword(s):

Data Analysis ◽

High Throughput Sequencing ◽

Enrichment Analysis ◽

Peak Calling ◽

Read Mapping ◽

Sequencing Technologies ◽

Genome Wide ◽

Wet Lab ◽

User Friendly ◽

Epigenetic Research

AbstractBackgroundIn recent years, epigenetic research has enjoyed explosive growth as high-throughput sequencing technologies become more accessible and affordable. However, this advancement has not been matched with similar progress in data analysis capabilities from the perspective of experimental biologists not versed in bioinformatic languages. For instance, chromatin immunoprecipitation followed by next-generation sequencing (ChIP-seq) is at present widely used to identify genomic loci of transcription factor binding and histone modifications. Basic ChIP-seq data analysis, including read mapping and peak calling, can be accomplished through several well-established tools, but more sophisticated analyzes aimed at comparing data derived from different conditions or experimental designs constitute a significant bottleneck. We reason that the implementation of a single comprehensive ChIP-seq analysis pipeline could be beneficial for many experimental (wet lab) researchers who would like to generate genomic data.ResultsHere we present ChIPdig, a stand-alone application with adjustable parameters designed to allow researchers to perform several analyzes, namely read mapping to a reference genome, peak calling, annotation of regions based on reference coordinates (e.g. transcription start and termination sites, exons, introns, 5′ UTRs and 3′ UTRs), and generation of heatmaps and metaplots for visualizing coverage. Importantly, ChIPdig accepts multiple ChIP-seq datasets as input, allowing genome-wide differential enrichment analysis in regions of interest to be performed. ChIPdig is written in R and enables access to several existing and highly utilized packages through a simple user interface powered by the Shiny package. Here, we illustrate the utility and user-friendly features of ChIPdig by analyzing H3K36me3 and H3K4me3 ChIP-seq profiles generated by the modENCODE project as an example.ConclusionsChIPdig offers a comprehensive and user-friendly pipeline for analysis of multiple sets of ChIP-seq data by both experimental and computational researchers. It is open source and available at https://github.com/rmesse/ChIPdig.

Download Full-text

Reassortment of Genome Segments Creates Stable Lineages Among Strains of Orchid Fleck Virus Infecting Citrus in Mexico

Phytopathology ◽

10.1094/phyto-07-19-0253-fi ◽

2020 ◽

Vol 110 (1) ◽

pp. 106-120 ◽

Cited By ~ 1

Author(s):

Avijit Roy ◽

Andrew L. Stone ◽

Gabriel Otero-Colina ◽

Gang Wei ◽

Ronald H. Brlansky ◽

...

Keyword(s):

High Throughput Sequencing ◽

Sensu Stricto ◽

Genome Segment ◽

Rt Pcr ◽

Sequence Comparisons ◽

Orchid Fleck Virus ◽

Reverse Transcription Pcr ◽

Sequencing Technologies ◽

Negative Sense

The genus Dichorhavirus contains viruses with bipartite, negative-sense, single-stranded RNA genomes that are transmitted by flat mites to hosts that include orchids, coffee, the genus Clerodendrum, and citrus. A dichorhavirus infecting citrus in Mexico is classified as a citrus strain of orchid fleck virus (OFV-Cit). We previously used RNA sequencing technologies on OFV-Cit samples from Mexico to develop an OFV-Cit–specific reverse transcription PCR (RT-PCR) assay. During assay validation, OFV-Cit–specific RT-PCR failed to produce an amplicon from some samples with clear symptoms of OFV-Cit. Characterization of this virus revealed that dichorhavirus-like particles were found in the nucleus. High-throughput sequencing of small RNAs from these citrus plants revealed a novel citrus strain of OFV, OFV-Cit2. Sequence comparisons with known orchid and citrus strains of OFV showed variation in the protein products encoded by genome segment 1 (RNA1). Strains of OFV clustered together based on host of origin, whether orchid or citrus, and were clearly separated from other dichorhaviruses described from infected citrus in Brazil. The variation in RNA1 between the original (now OFV-Cit1) and the new (OFV-Cit2) strain was not observed with genome segment 2 (RNA2), but instead, a common RNA2 molecule was shared among strains of OFV-Cit1 and -Cit2, a situation strikingly similar to OFV infecting orchids. We also collected mites at the affected groves, identified them as Brevipalpus californicus sensu stricto, and confirmed that they were infected by OFV-Cit1 or with both OFV-Cit1 and -Cit2. OFV-Cit1 and -Cit2 have coexisted at the same site in Toliman, Queretaro, Mexico since 2012. OFV strain-specific diagnostic tests were developed.

Download Full-text

Application of Oxford Nanopore Technology to Plant Virus Detection

Viruses ◽

10.3390/v13081424 ◽

2021 ◽

Vol 13 (8) ◽

pp. 1424

Author(s):

Lia W. Liefting ◽

David W. Waite ◽

Jeremy R. Thompson

Keyword(s):

Plant Virus ◽

High Throughput Sequencing ◽

Virus Detection ◽

Diagnostic Methods ◽

Plant Virus Detection ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Virus Diagnostics ◽

Post Entry ◽

Read Accuracy

The adoption of Oxford Nanopore Technologies (ONT) sequencing as a tool in plant virology has been relatively slow despite its promise in more recent years to yield large quantities of long nucleotide sequences in real time without the need for prior amplification. The portability of the MinION and Flongle platforms combined with lowering costs and continued improvements in read accuracy make ONT an attractive method for both low- and high-scale virus diagnostics. Here, we provide a detailed step-by-step protocol using the ONT Flongle platform that we have developed for the routine application on a range of symptomatic post-entry quarantine and domestic surveillance plant samples. The aim of this methods paper is to highlight ONT’s feasibility as a valuable component to the diagnostician’s toolkit and to hopefully stimulate other laboratories towards the eventual goal of integrating high-throughput sequencing technologies as validated plant virus diagnostic methods in their own right.

Download Full-text