scholarly journals SnakeChunks: modular blocks to build Snakemake workflows for reproducible NGS analyses

2017 ◽  
Author(s):  
Claire Rioualen ◽  
Lucie Charbonnier-Khamvongsa ◽  
Jacques van Helden

AbstractSummaryNext-Generation Sequencing (NGS) is becoming a routine approach for most domains of life sciences, yet there is a crucial need to improve the automation of processing for the huge amounts of data generated and to ensure reproducible results. We present SnakeChunks, a collection of Snakemake rules enabling to compose modular and user-configurable workflows, and show its usage with analyses of transcriptome (RNA-seq) and genome-wide location (ChIP-seq) data.AvailabilityThe code is freely available (github.com/SnakeChunks/SnakeChunks), and documented with tutorials and illustrative demos (snakechunks.readthedocs.io)[email protected], [email protected] informationSupplementary data are available at Bioinformatics online.

2020 ◽  
Author(s):  
João R. Almeida ◽  
Armando J. Pinho ◽  
José L. Oliveira ◽  
Olga Fajarda ◽  
Diogo Pratas

AbstractSummaryNext-generation sequencing triggered the production of a massive volume of publicly available data and the development of new specialised tools. These tools are dispersed over different frameworks, making the management and analyses of the data a challenging task. Additionally, new targeted tools are needed, given the dynamics and specificities of the field. We present GTO, a comprehensive toolkit designed to unify pipelines in genomic and proteomic research, which combines specialised tools for analysis, simulation, compression, development, visualisation, and transformation of the data. This toolkit combines novel tools with a modular architecture, being an excellent platform for experimental scientists, as well as a useful resource for teaching bioinformatics inquiry to students in life sciences.Availability and implementationGTO is implemented in C language and it is available, under the MIT license, at http://bioinformatics.ua.pt/[email protected] informationSupplementary data are available at publisher’s Web site.


Author(s):  
Ting-Hsuan Wang ◽  
Cheng-Ching Huang ◽  
Jui-Hung Hung

Abstract Motivation Cross-sample comparisons or large-scale meta-analyses based on the next generation sequencing (NGS) involve replicable and universal data preprocessing, including removing adapter fragments in contaminated reads (i.e. adapter trimming). While modern adapter trimmers require users to provide candidate adapter sequences for each sample, which are sometimes unavailable or falsely documented in the repositories (such as GEO or SRA), large-scale meta-analyses are therefore jeopardized by suboptimal adapter trimming. Results Here we introduce a set of fast and accurate adapter detection and trimming algorithms that entail no a priori adapter sequences. These algorithms were implemented in modern C++ with SIMD and multithreading to accelerate its speed. Our experiments and benchmarks show that the implementation (i.e. EARRINGS), without being given any hint of adapter sequences, can reach comparable accuracy and higher throughput than that of existing adapter trimmers. EARRINGS is particularly useful in meta-analyses of a large batch of datasets and can be incorporated in any sequence analysis pipelines in all scales. Availability and implementation EARRINGS is open-source software and is available at https://github.com/jhhung/EARRINGS. Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Author(s):  
Claire Marchal ◽  
Takayo Sasaki ◽  
Daniel Vera ◽  
Korey Wilson ◽  
Jiao Sima ◽  
...  

ABSTRACTCycling cells duplicate their DNA content during S phase, following a defined program called replication timing (RT). Early and late replicating regions differ in terms of mutation rates, transcriptional activity, chromatin marks and sub-nuclear position. Moreover, RT is regulated during development and is altered in disease. Exploring mechanisms linking RT to other cellular processes in normal and diseased cells will be facilitated by rapid and robust methods with which to measure RT genome wide. Here, we describe a rapid, robust and relatively inexpensive protocol to analyze genome-wide RT by next-generation sequencing (NGS). This protocol yields highly reproducible results across laboratories and platforms. We also provide computational pipelines for analysis, parsing phased genomes using single nucleotide polymorphisms (SNP) for analyzing RT allelic asynchrony, and for direct comparison to Repli-chip data obtained by analyzing nascent DNA by microarrays.


2019 ◽  
Vol 36 (8) ◽  
pp. 2587-2588 ◽  
Author(s):  
Christopher M Ward ◽  
Thu-Hien To ◽  
Stephen M Pederson

Abstract Motivation High throughput next generation sequencing (NGS) has become exceedingly cheap, facilitating studies to be undertaken containing large sample numbers. Quality control (QC) is an essential stage during analytic pipelines and the outputs of popular bioinformatics tools such as FastQC and Picard can provide information on individual samples. Although these tools provide considerable power when carrying out QC, large sample numbers can make inspection of all samples and identification of systemic bias a challenge. Results We present ngsReports, an R package designed for the management and visualization of NGS reports from within an R environment. The available methods allow direct import into R of FastQC reports along with outputs from other tools. Visualization can be carried out across many samples using default, highly customizable plots with options to perform hierarchical clustering to quickly identify outlier libraries. Moreover, these can be displayed in an interactive shiny app or HTML report for ease of analysis. Availability and implementation The ngsReports package is available on Bioconductor and the GUI shiny app is available at https://github.com/UofABioinformaticsHub/shinyNgsreports. Supplementary information Supplementary data are available at Bioinformatics online.


Genes ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 92 ◽  
Author(s):  
Shannon J. McKie ◽  
Anthony Maxwell ◽  
Keir C. Neuman

Next-generation sequencing (NGS) platforms have been adapted to generate genome-wide maps and sequence context of binding and cleavage of DNA topoisomerases (topos). Continuous refinements of these techniques have resulted in the acquisition of data with unprecedented depth and resolution, which has shed new light on in vivo topo behavior. Topos regulate DNA topology through the formation of reversible single- or double-stranded DNA breaks. Topo activity is critical for DNA metabolism in general, and in particular to support transcription and replication. However, the binding and activity of topos over the genome in vivo was difficult to study until the advent of NGS. Over and above traditional chromatin immunoprecipitation (ChIP)-seq approaches that probe protein binding, the unique formation of covalent protein–DNA linkages associated with DNA cleavage by topos affords the ability to probe cleavage and, by extension, activity over the genome. NGS platforms have facilitated genome-wide studies mapping the behavior of topos in vivo, how the behavior varies among species and how inhibitors affect cleavage. Many NGS approaches achieve nucleotide resolution of topo binding and cleavage sites, imparting an extent of information not previously attainable. We review the development of NGS approaches to probe topo interactions over the genome in vivo and highlight general conclusions and quandaries that have arisen from this rapidly advancing field of topoisomerase research.


Author(s):  
Edo D’Agaro ◽  
Andrea Favaro ◽  
Davide Rosa

In the past fifteen years, tremendous progress has been made in dog genomics. Several genetic aspects of cancer, heart disease, hip dysplasia, vision and hearing problems in dogs have been investigated and studied in detail. Genome-wide associative studies have made it possible to identify several genes associated with diseases, morphological and behavioral traits. The dog genome contains an extraordinary amount of genetic variability that distinguishes the different dog breeds. As a consequence of the selective programs, applied using stringent breed standards, each dog breed represents, today, a population isolated from the others. The availability of modern next generation sequencing (NGS) techniques and the identification of millions of single functional mutations (SNPs) has enabled us to obtain new and unknown detailed genomic data of the different breeds.


Author(s):  
Afzal Hussain

Next-generation sequencing or massively parallel sequencing describe DNA sequencing, RNA sequencing, or methylation sequencing, which shows its great impact on the life sciences. The recent advances of these parallel sequencing for the generation of huge amounts of data in a very short period of time as well as reducing the computing cost for the same. It plays a major role in the gene expression profiling, chromosome counting, finding out the epigenetic changes, and enabling the future of personalized medicine. Here the authors describe the NGS technologies and its application as well as applying different tools such as TopHat, Bowtie, Cufflinks, Cuffmerge, Cuffdiff for analyzing the high throughput RNA sequencing (RNA-Seq) data.


2015 ◽  
Vol 24 (1) ◽  
pp. 2-5 ◽  
Author(s):  
Gert Matthijs ◽  
Erika Souche ◽  
Mariëlle Alders ◽  
Anniek Corveleyn ◽  
Sebastian Eck ◽  
...  

Abstract We present, on behalf of EuroGentest and the European Society of Human Genetics, guidelines for the evaluation and validation of next-generation sequencing (NGS) applications for the diagnosis of genetic disorders. The work was performed by a group of laboratory geneticists and bioinformaticians, and discussed with clinical geneticists, industry and patients’ representatives, and other stakeholders in the field of human genetics. The statements that were written during the elaboration of the guidelines are presented here. The background document and full guidelines are available as supplementary material. They include many examples to assist the laboratories in the implementation of NGS and accreditation of this service. The work and ideas presented by others in guidelines that have emerged elsewhere in the course of the past few years were also considered and are acknowledged in the full text. Interestingly, a few new insights that have not been cited before have emerged during the preparation of the guidelines. The most important new feature is the presentation of a ‘rating system’ for NGS-based diagnostic tests. The guidelines and statements have been applauded by the genetic diagnostic community, and thus seem to be valuable for the harmonization and quality assurance of NGS diagnostics in Europe.


Sign in / Sign up

Export Citation Format

Share Document