scholarly journals BTW—Bioinformatics Through Windows: an easy-to-install package to analyze marker gene data

PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5299 ◽  
Author(s):  
Daniel K. Morais ◽  
Luiz F.W. Roesch ◽  
Marc Redmile-Gordon ◽  
Fausto G. Santos ◽  
Petr Baldrian ◽  
...  

Recent advances in Next-Generation Sequencing (NGS) make comparative analyses of the composition and diversity of whole microbial communities possible at a far greater depth than ever before. This brings new challenges, such as an increased dependence on computation to process these huge datasets. The demand on system resources usually requires migrating from Windows to Linux-based operating systems and prior familiarity with command-line interfaces. To overcome this barrier, we developed a fully automated and easy-to-install package as well as a complete, easy-to-follow pipeline for microbial metataxonomic analysis operating in the Windows Subsystem for Linux (WSL)—Bioinformatics Through Windows (BTW). BTW combines several open-access tools for processing marker gene data, including 16S rRNA, bringing the user from raw sequencing reads to diversity-related conclusions. It includes data quality filtering, clustering, taxonomic assignment and further statistical analyses, directly in WSL, avoiding the prior need of migrating from Windows to Linux. BTW is expected to boost the use of NGS amplicon data by facilitating rapid access to a set of bioinformatics tools for Windows users. Moreover, several Linux command line tools became more reachable, which will enhance bioinformatics accessibility to a wider range of researchers and practitioners in the life sciences and medicine. BTW is available in GitHub (https://github.com/vpylro/BTW). The package is freely available for noncommercial users.

2018 ◽  
Author(s):  
Daniel Morais ◽  
Luiz Roesch ◽  
Marc Redmile-Gordon ◽  
Fausto Santos ◽  
Petr Baldrian ◽  
...  

Recent advances in Next-Generation Sequencing (NGS) make comparative analyses of the composition and diversity of whole microbial communities possible at far greater depth than ever before. This brings new challenges, such as an increased dependence on computation to process these huge datasets. The demand on system resources usually requires migrating from Windows to Linux-based operating systems and prior familiarity with command-line interfaces. To overcome this barrier, we developed a fully automated and easy-to-install package as well as a complete, easy to follow pipeline for microbial metataxonomic analysis operating in the Windows Subsystem for Linux (WSL) - Bioinformatics Through Windows (BTW). BTW combines several open-access tools for processing marker gene data, including 16S rRNA, bringing the user from raw sequencing reads to diversity-related conclusions. It includes data quality filtering, clustering, taxonomic assignment and further statistical analyses, directly in WSL, avoiding the prior need of migrating from Windows to Linux. BTW is expected to boost the use of NGS amplicon data by facilitating rapid access to bioinformatics tools for Windows users. BTW is a Bash script and is available in GitHub ( https://github.com/vpylro/BTW ). The package is freely available for noncommercial users.


2018 ◽  
Author(s):  
Daniel Morais ◽  
Luiz Roesch ◽  
Marc Redmile-Gordon ◽  
Fausto Santos ◽  
Petr Baldrian ◽  
...  

Recent advances in Next-Generation Sequencing (NGS) make comparative analyses of the composition and diversity of whole microbial communities possible at far greater depth than ever before. This brings new challenges, such as an increased dependence on computation to process these huge datasets. The demand on system resources usually requires migrating from Windows to Linux-based operating systems and prior familiarity with command-line interfaces. To overcome this barrier, we developed a fully automated and easy-to-install package as well as a complete, easy to follow pipeline for microbial metataxonomic analysis operating in the Windows Subsystem for Linux (WSL) - Bioinformatics Through Windows (BTW). BTW combines several open-access tools for processing marker gene data, including 16S rRNA, bringing the user from raw sequencing reads to diversity-related conclusions. It includes data quality filtering, clustering, taxonomic assignment and further statistical analyses, directly in WSL, avoiding the prior need of migrating from Windows to Linux. BTW is expected to boost the use of NGS amplicon data by facilitating rapid access to bioinformatics tools for Windows users. BTW is a Bash script and is available in GitHub ( https://github.com/vpylro/BTW ). The package is freely available for noncommercial users.


Author(s):  
Elsbeth Bösl ◽  
Stefanie Samida

Today, DNA sequencing is part of the standard repertoire of biological and medical research. Next generation sequencing (NGS), established around the mid-2000s, was the main catalyst for this development. NGS has led to major knowledge gains in the molecular life sciences. However, the new technology provides data that pose new challenges that both science and society still must learn to deal with. A technology-driven dynamic can already be observed in this field, leading to transformation processes in science, where new fields of research are emerging, but also in society, where questions of identity are increasingly being negotiated based on genetic analyses.


Diagnostics ◽  
2021 ◽  
Vol 11 (8) ◽  
pp. 1378
Author(s):  
Vincenzo Castiglione ◽  
Martina Modena ◽  
Alberto Aimo ◽  
Enrica Chiti ◽  
Nicoletta Botto ◽  
...  

Molecular autopsy is the process of investigating sudden death through genetic analysis. It is particularly useful in cases where traditional autopsy is negative or only shows non-diagnostic features, i.e., in sudden unexplained deaths (SUDs), which are often due to an underlying inherited arrhythmogenic cardiac disease. The final goal of molecular autopsy in SUD cases is to aid medico-legal inquiries and to guide cascade genetic screening of the victim’s relatives. Early attempts of molecular autopsy relied on Sanger sequencing, which, despite being accurate and easy to use, has a low throughput and can only be employed to analyse a small panel of genes. Conversely, the recent adoption of next-generation sequencing (NGS) technologies has allowed exome/genome wide examination, providing an increase in detection of pathogenic variants and the discovery of newer genotype-phenotype associations. NGS has nonetheless brought new challenges to molecular autopsy, especially regarding the clinical interpretation of the large number of variants of unknown significance detected in each individual.


2016 ◽  
Author(s):  
Andrew Krohn ◽  
Bo Stevens ◽  
Adam Robbins-Pianka ◽  
Matthew Belus ◽  
Gerard J Allan ◽  
...  

The diversity of complex microbial communities can be rapidly assessed by high-throughput DNA sequencing of marker gene (e.g., 16S) PCR amplicon pools, often yielding many thousands of DNA sequences per sample. However, analysis of such community amplicon sequencing data requires multiple computational steps which affect the outcome of a final data set. Here we use mock communities to describe the effects of parameter adjustments for raw sequence quality filtering, picking operational taxonomic units (OTUs), taxonomic assignment, and OTU table filtering as implemented in the popular microbial ecology analysis package, QIIME 1.9.1. We demonstrate a workflow optimization based upon this exploration, which we also apply to environmental samples. We found that quality filtering of raw data and filtering of OTU tables had large effects on observed OTU diversity. While all taxonomy assignment programs performed with similar accuracy, an appropriate choice of similarity threshold for defining OTUs depended on the method used for OTU picking. Our “default” analysis in QIIME overestimated mock community OTU diversity by at least a factor of ten. Our optimized analysis correctly characterized mock community taxonomic composition and improved the OTU diversity estimate, reducing overestimation to a factor of about two. Though observed relative abundances of mock community member taxa were approximately correct, most were still represented by multiple OTUs. Low-frequency OTUs conspecific to constituent mock community taxa were characterized by multiple substitution and indel errors and the presence of a low-quality base call resulting in sequence truncation during quality filtering. Low-quality base calls were observed at “G” positions most of the time, and were also associated with a preceding “TTT” trinucleotide motif. Environmental diversity estimates were reduced by about 40% from 2508 to 1533 OTUs when comparing output from the default and optimized workflows. We attribute this reduction in observed diversity to the removal of erroneous sequences from the data set. Our results indicate that both strict quality filtering of raw sequencing data and careful filtering of raw OTU tables are important steps for accurately estimating microbial community diversity.


2019 ◽  
Vol 20 (1) ◽  
pp. 1-11
Author(s):  
Adibah Parmen ◽  
MOHD NOOR MAT ISA ◽  
FARAH FADWA BENBELGACEM ◽  
Hamzah Mohd Salleh ◽  
Ibrahim Ali Noorbatcha

ABSTRACT: The substantial cost reduction and massive production of next-generation sequencing (NGS) data have contributed to the progress in the rapid growth of metagenomics. However, production of the massive amount of data by NGS has revealed the challenges in handling the existing bioinformatics tools related to metagenomics. Therefore, in this research we have investigated an equal set of DNA metagenomics data from palm oil mill effluent (POME) sample using three different freeware bioinformatics pipelines’ websites of metagenomics RAST server (MG-RAST), Integrated Microbial Genomes with Microbiome Samples (IMG/M) and European Bioinformatics Institute (EBI) Metagenomics, in term of the taxonomic assignment and functional analysis. We found that MG-RAST is the quickest among these three pipelines. However, in term of analysis of results, IMG/M provides more variety of phylum with wider percent identities for taxonomical assignment and IMG/M provides the highest carbohydrates, amino acids, lipids, and coenzymes transport and metabolism functional annotation beside the highest in total number of glycoside hydrolase enzymes. Next, in identifying the conserved domain and family involved, EBI Metagenomics would be much more appropriate. All the three bioinformatics pipelines have their own specialties and can be used alternately or at the same time based on the user’s functional preference. ABSTRAK: Pengurangan kos dalam skala besar dan pengeluaran data ‘next-generation sequencing’ (NGS) secara besar-besaran telah menyumbang kepada pertumbuhan pesat metagenomik. Walau bagaimanapun, pengeluaran data dalam skala yang besar oleh NGS telah menimbulkan cabaran dalam mengendalikan alat-alat bioinformatika yang sedia ada berkaitan dengan metagenomik. Justeru itu, dalam kajian ini, kami telah menyiasat satu set data metagenomik DNA yang sama dari sampel effluen kilang minyak sawit dengan menggunakan tiga laman web bioinformatik percuma iaitu dari laman web ‘metagenomics RAST server’ (MG-RAST), ‘Integrated Microbial Genomes with Microbiome Samples’ (IMG/M) dan ‘European Bioinformatics Institute’ (EBI) Metagenomics dari segi taksonomi dan analisis fungsi. Kami mendapati bahawa MG-RAST ialah yang paling cepat di antara ketiga-tiga ‘pipeline’, tetapi mengikut keputusan analisa, IMG/M mengeluarkan maklumat philum yang lebih pelbagai bersama peratus identiti yang lebih luas berbanding yang lain untuk pembahagian taksonomi dan IMG/M juga mempunyai bacaan tertinggi dalam hampir semua anotasi fungsional karbohidrat, amino asid, lipid, dan koenzima pengangkutan dan metabolisma malah juga paling tinggi dalam jumlah enzim hidrolase glikosida. Kemudian, untuk mengenal pasti ‘domain’ terpelihara dan keluarga yang terlibat, EBI metagenomics lebih bersesuaian. Ketiga-tiga saluran ‘bioinformatics pipeline’ mempunyai keistimewaan mereka yang tersendiri dan boleh digunakan bersilih ganti dalam masa yang sama berdasarkan pilihan fungsi penggun.


Author(s):  
Elsbeth Bösl ◽  
Stefanie Samida

Next Generation Sequencing led to major knowledge gains in the molecular life sciences. But the new technology provides data that pose new challenges to both science and society. New fields of research are emerging and questions of identity on the basis of genetic analyses are being negotiated.


Author(s):  
Hyungtaek Jung ◽  
Brendan Jeon ◽  
Daniel Ortiz-Barrientos

Storing and manipulating Next Generation Sequencing (NGS) file formats for understanding biological phenomena is an essential but difficult task in the life sciences. Yet, most methods for analysing NGS data require complex command-line tools in high-performance computing (HPC) or web-based servers and have not yet been implemented in comprehensive, easy-to-use software. Here we present easyfm (easy file manipulation), a free standalone Graphical User Interface (GUI) software with Python support that can be used to facilitate the rapid discovery of target sequences (or user’s interest) in NGS datasets for novice users (more accessible to biologists). It enables them to perform end-to-end reproducible data analyses using a desktop application (Windows, Mac and Linux). Unlike existing tools, the GUI-based easyfm is not dependent on any HPC system and can be operated without an internet connection. For user-friendliness and convenience, easyfm was developed with four work modules and a secondary GUI window, covering different aspects of NGS data analysis, including post-processing, filtering, format conversion, generating results, real-time log, and help. In combination with the executable tools (BLAST+ and BLAT) and Python, easyfm allows the user to set analysis parameters, select/extract regions of interest, examine the input and output results, and convert to a wide range of file formats. To help augment the functionality of existing web-based and command-line tools, easyfm, a self-contained program, comes with extensive documentation (https://github.com/TaekAndBrendan/easyfm). This specific benefit allows easyfm to seamlessly integrate visual and interactive representations of NGS files, supporting a wider scope of bioinformatics applications in the life sciences.


Author(s):  
Andrew Krohn ◽  
Bo Stevens ◽  
Adam Robbins-Pianka ◽  
Matthew Belus ◽  
Gerard J Allan ◽  
...  

The diversity of complex microbial communities can be rapidly assessed by high-throughput DNA sequencing of marker gene (e.g., 16S) PCR amplicon pools, often yielding many thousands of DNA sequences per sample. However, analysis of such community amplicon sequencing data requires multiple computational steps which affect the outcome of a final data set. Here we use mock communities to describe the effects of parameter adjustments for raw sequence quality filtering, picking operational taxonomic units (OTUs), taxonomic assignment, and OTU table filtering as implemented in the popular microbial ecology analysis package, QIIME 1.9.1. We demonstrate a workflow optimization based upon this exploration, which we also apply to environmental samples. We found that quality filtering of raw data and filtering of OTU tables had large effects on observed OTU diversity. While all taxonomy assignment programs performed with similar accuracy, an appropriate choice of similarity threshold for defining OTUs depended on the method used for OTU picking. Our “default” analysis in QIIME overestimated mock community OTU diversity by at least a factor of ten. Our optimized analysis correctly characterized mock community taxonomic composition and improved the OTU diversity estimate, reducing overestimation to a factor of about two. Though observed relative abundances of mock community member taxa were approximately correct, most were still represented by multiple OTUs. Low-frequency OTUs conspecific to constituent mock community taxa were characterized by multiple substitution and indel errors and the presence of a low-quality base call resulting in sequence truncation during quality filtering. Low-quality base calls were observed at “G” positions most of the time, and were also associated with a preceding “TTT” trinucleotide motif. Environmental diversity estimates were reduced by about 40% from 2508 to 1533 OTUs when comparing output from the default and optimized workflows. We attribute this reduction in observed diversity to the removal of erroneous sequences from the data set. Our results indicate that both strict quality filtering of raw sequencing data and careful filtering of raw OTU tables are important steps for accurately estimating microbial community diversity.


2018 ◽  
Vol 1 (1) ◽  
pp. 8-9
Author(s):  
Laribi Kamel ◽  
◽  
Baugier de Materre Alix ◽  

The recent years have seen an acceleration in understanding the physio-pathological mechanisms of many diseases, as well as improvement in their follow-up with more and more powerful tools like flow cytometry, quantitative PCR and more recently the next generation sequencing (NGS), as well as the follow-up of residual disease on liquid biopsies (circulating DNA) but especially the discovery of many new drugs that significantly improved patient’s survival.


Sign in / Sign up

Export Citation Format

Share Document