Accuracy of microbial community diversity estimated by closed- and open-reference OTUs

PeerJ ◽

10.7717/peerj.3889 ◽

2017 ◽

Vol 5 ◽

pp. e3889 ◽

Cited By ~ 69

Author(s):

Robert C. Edgar

Keyword(s):

Ribosomal Rna ◽

De Novo ◽

Community Diversity ◽

Reference Database ◽

Mock Community ◽

Variable Regions ◽

Operational Taxonomic Units ◽

Sequencing Technologies ◽

Generation Sequencing ◽

Mock Communities

Next-generation sequencing of 16S ribosomal RNA is widely used to survey microbial communities. Sequences are typically assigned to Operational Taxonomic Units (OTUs). Closed- and open-reference OTU assignment matches reads to a reference database at 97% identity (closed), then clusters unmatched reads using a de novo method (open). Implementations of these methods in the QIIME package were tested on several mock community datasets with 20 strains using different sequencing technologies and primers. Richness (number of reported OTUs) was often greatly exaggerated, with hundreds or thousands of OTUs generated on Illumina datasets. Between-sample diversity was also found to be highly exaggerated in many cases, with weighted Jaccard distances between identical mock samples often close to one, indicating very low similarity. Non-overlapping hyper-variable regions in 70% of species were assigned to different OTUs. On mock communities with Illumina V4 reads, 56% to 88% of predicted genus names were false positives. Biological inferences obtained using these methods are therefore not reliable.

Download Full-text

Computational Approaches for Transcriptome Assembly Based on Sequencing Technologies

Current Bioinformatics ◽

10.2174/1574893614666190410155603 ◽

2020 ◽

Vol 15 (1) ◽

pp. 2-16

Author(s):

Yuwen Luo ◽

Xingyu Liao ◽

Fang-Xiang Wu ◽

Jianxin Wang

Keyword(s):

De Novo ◽

Transcriptome Assembly ◽

Critical Role ◽

High Sensitivity ◽

Biological Properties ◽

Sequencing Data ◽

Sequencing Technologies ◽

Long Reads ◽

Massive Sequencing ◽

Generation Sequencing

Transcriptome assembly plays a critical role in studying biological properties and examining the expression levels of genomes in specific cells. It is also the basis of many downstream analyses. With the increase of speed and the decrease in cost, massive sequencing data continues to accumulate. A large number of assembly strategies based on different computational methods and experiments have been developed. How to efficiently perform transcriptome assembly with high sensitivity and accuracy becomes a key issue. In this work, the issues with transcriptome assembly are explored based on different sequencing technologies. Specifically, transcriptome assemblies with next-generation sequencing reads are divided into reference-based assemblies and de novo assemblies. The examples of different species are used to illustrate that long reads produced by the third-generation sequencing technologies can cover fulllength transcripts without assemblies. In addition, different transcriptome assemblies using the Hybrid-seq methods and other tools are also summarized. Finally, we discuss the future directions of transcriptome assemblies.

Download Full-text

A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies

PLoS ONE ◽

10.1371/journal.pone.0017915 ◽

2011 ◽

Vol 6 (3) ◽

pp. e17915 ◽

Cited By ~ 144

Author(s):

Wenyu Zhang ◽

Jiajia Chen ◽

Yang Yang ◽

Yifei Tang ◽

Jing Shang ◽

...

Keyword(s):

Next Generation Sequencing ◽

Genome Assembly ◽

De Novo ◽

Software Tools ◽

Next Generation ◽

De Novo Genome Assembly ◽

Sequencing Technologies ◽

Generation Sequencing ◽

Assembly Software

Download Full-text

De Novo Assembly of Human Herpes Virus Type 1 (HHV-1) Genome, Mining of Non-Canonical Structures and Detection of Novel Drug-Resistance Mutations Using Short- and Long-Read Next Generation Sequencing Technologies

PLoS ONE ◽

10.1371/journal.pone.0157600 ◽

2016 ◽

Vol 11 (6) ◽

pp. e0157600 ◽

Cited By ~ 33

Author(s):

Timokratis Karamitros ◽

Ian Harrison ◽

Renata Piorkowska ◽

Aris Katzourakis ◽

Gkikas Magiorkinis ◽

...

Keyword(s):

De Novo ◽

Genome Mining ◽

Resistance Mutations ◽

Drug Resistance Mutations ◽

Sequencing Technologies ◽

Long Read ◽

Novel Drug ◽

Generation Sequencing ◽

Human Herpes Virus Type

Download Full-text

Optimization of 16S amplicon analysis using mock communities: implications for estimating community diversity

10.7287/peerj.preprints.2196v2 ◽

2016 ◽

Cited By ~ 1

Author(s):

Andrew Krohn ◽

Bo Stevens ◽

Adam Robbins-Pianka ◽

Matthew Belus ◽

Gerard J Allan ◽

...

Keyword(s):

Amplicon Sequencing ◽

Community Diversity ◽

Accurate Estimation ◽

Marker Genes ◽

Sequencing Data ◽

Mock Community ◽

Data Set ◽

Environmental Diversity ◽

Quality Filtering ◽

Mock Communities

Diversity of complex microbial communities can be rapidly assessed by community amplicon sequencing of marker genes (e.g., 16S), often yielding many thousands of DNA sequences per sample. However, analysis of community amplicon sequencing data requires multiple computational steps which affect the outcome of a final data set. Here we use mock communities to describe the effects of parameter adjustments for raw sequence quality filtering, picking operational taxonomic units (OTUs), taxonomic assignment, and OTU table filtering as implemented in QIIME 1.9.1. We demonstrate a workflow optimization based upon this exploration which we also apply to environmental samples. We found that quality filtering of raw data and filtering of OTU tables had large effects on observed OTU diversity. While all taxonomy assigners performed with similar accuracy, an appropriate choice of similarity threshold for defining OTUs depended on the method used for OTU picking. Our “default” analysis in QIIME overestimated mock community diversity by at least a factor of ten, compared to the optimized analysis which correctly characterized the taxonomic composition of the mock communities while still overestimating OTU diversity by about a factor of two. Though observed relative abundances of mock community member taxa were approximately correct, most were still represented by multiple OTUs. Low-frequency OTUs conspecific to constituent mock community taxa were characterized by multiple substitution and indel errors and the presence of a low quality base call resulting in sequence truncation during quality filtering. Low quality base calls were observed at “G” positions most of the time, and were also associated with a preceding “TTT” trinucleotide motif. Environmental diversity estimates were reduced by about 40% from 2508 to 1533 OTUs when comparing output from the default and optimized workflows. We attribute this reduction in observed diversity to the removal of erroneous sequences from the data set. Our results indicate that both strict quality filtering of raw sequencing data and careful filtering of raw OTU tables are important steps for accurate estimation of microbial community diversity.

Download Full-text

CD-HIT-OTU-MiSeq, an Improved Approach for Clustering and Analyzing Paired End MiSeq 16S rRNA Sequences

10.1101/153783 ◽

2017 ◽

Cited By ~ 3

Author(s):

Weizhong Li ◽

Yuanyuan Chang

Keyword(s):

16S Rrna ◽

High Speed ◽

De Novo ◽

Sequence Data ◽

Illumina Miseq ◽

Poor Quality ◽

Reference Database ◽

Rrna Gene ◽

Variable Regions ◽

Novel Approach

AbstractIn recent years, Illumina MiSeq sequencers replaced pyrosequencing platforms and became dominant in 16S rRNA sequencing. One unique feature of MiSeq technology, compared with Pyrosequencing, is the Paired End (PE) reads, with each read can be sequenced to 250-300 bases to cover multiple variable regions on the 16S rRNA gene. However, the PE reads need to be assembled into a single contig at the beginning of the analysis. Although there are many methods capable of assembling PE reads into contigs, a big portion of PE reads can not be accurately assembled because the poor quality at the 3’ ends of both PE reads in the overlapping region. This causes that many sequences are discarded in the analysis. In this study, we developed a novel approach for clustering and annotation MiSeq-based 16S sequence data, CD-HIT-OTU-MiSeq. This new approach has four distinct novel features. (1) The package can clustering PE reads without joining them into contigs. (2) Users can choose a high quality portion of the PE reads for analysis (e.g. first 200 / 150 bases from forward / reverse reads), according to base quality profile. (3) We implemented a tool that can splice out the target region (e.g. V3-V4) from a full-length 16S reference database into the PE sequences. CD-HIT-OTU-MiSeq can cluster the spliced PE reference database together with samples, so we can derive Operational Taxonomic Units (OTUs) and annotate these OTUs concurrently. (4) Chimeric sequences are effectively identified through de novo approach. The package offers high speed and high accuracy. The software package is freely available as open source package and is distributed along with CD-HIT from http://cd-hit.org. Within the CD-HIT package, CD-HIT-OTU-MiSeq is within the usecase folder.

Download Full-text

Telomere Length De Novo Assembly of all 7 Chromosomes and Mitogenome Sequencing of the Model Entomopathogenic Fungus, Metarhizium Brunneum, by Means of a Novel Assembly Pipeline

10.21203/rs.3.rs-60098/v1 ◽

2020 ◽

Author(s):

Zack Saud ◽

Alexandra M. Kortsinoglou ◽

Vassili N. Kouvelis ◽

Tariq M. Butt

Keyword(s):

Entomopathogenic Fungus ◽

De Novo ◽

Gene Prediction ◽

Fungal Species ◽

Orthologous Protein ◽

Metarhizium Brunneum ◽

Sequencing Technologies ◽

Protein Clusters ◽

Assembly Pipeline ◽

Generation Sequencing

Abstract More accurate and complete reference genomes have improved understanding of gene function, biology, and evolutionary mechanisms. Hybrid genome assembly approaches leverage benefits of both long, relatively error-prone reads from third-generation sequencing technologies and short, accurate reads from second-generation sequencing technologies, to produce more accurate and contiguous de novo genome assemblies in comparison to using either technology independently. In this study, we present a novel hybrid assembly pipeline that allowed for both mitogenome de novo assembly and telomere length de novo assembly of all 7 chromosomes of the model entomopathogenic fungus, Metarhizium brunneum. The improved assembly allowed for better ab initio gene prediction and a more BUSCO complete proteome set has been generated in comparison to the eight current NCBI reference Metarhizium spp. genomes. Remarkably, we note that including the mitogenome in ab initio gene prediction training improved overall gene prediction. The assembly was further validated by comparing contig assembly agreement across various assemblers, assessing the assembly performance of each tool. Genomic synteny and orthologous protein clusters were compared between Metarhizium brunneum and three other Hypocreales species with complete genomes, identifying core proteins, and listing orthologous protein clusters shared uniquely between the two entomopathogenic fungal species, so as to further facilitate the understanding of molecular mechanisms underpinning fungal-insect pathogenesis. The novel assembly pipeline may be used for other haploid fungal species, facilitating the need to produce high-quality reference fungal genomes, leading to better understanding of fungal genomic evolution, chromosome structuring and gene regulation.

Download Full-text

UNCROSS2: identification of cross-talk in 16S rRNA OTU tables

10.1101/400762 ◽

2018 ◽

Cited By ~ 7

Author(s):

Robert C. Edgar

Keyword(s):

Ribosomal Rna ◽

Cross Talk ◽

Biological Diversity ◽

Amplicon Sequencing ◽

Next Generation ◽

Mock Community ◽

Repertoire Analysis ◽

Microbial Metagenomics ◽

Tumor Sequencing ◽

Generation Sequencing

AbstractNext-generation amplicon sequencing is widely used for surveying biological diversity in applications such as microbial metagenomics, immune system repertoire analysis and targeted tumor sequencing of cancer-associated genes. In such studies, assignment of reads to incorrect samples (cross-talk) is a well-documented problem that is rarely considered in practice. Here, I describe UNCROSS2, an algorithm designed to detect and filter cross-talk in OTU tables generated by next-generation sequencing of the 16S ribosomal RNA gene. On eight published datasets, cross-talk rates are estimated to range from 0.4% to 1.5% mis-assigned reads. On a mock community test, UNCROSS2 identifies spurious counts due to cross-talk with sensitivity ∼80% to 90% and error rate from ∼1% to ∼20%, but it is not clear whether the accuracy of the algorithm is sufficient to decisively improve diversity rates in practice.

Download Full-text

Telomere length de novo assembly of all 7 chromosomes and mitogenome sequencing of the model entomopathogenic fungus, Metarhizium brunneum, by means of a novel assembly pipeline

BMC Genomics ◽

10.1186/s12864-021-07390-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Zack Saud ◽

Alexandra M. Kortsinoglou ◽

Vassili N. Kouvelis ◽

Tariq M. Butt

Keyword(s):

De Novo Assembly ◽

De Novo ◽

Gene Prediction ◽

Fungal Species ◽

Orthologous Protein ◽

Metarhizium Brunneum ◽

Sequencing Technologies ◽

Protein Clusters ◽

Assembly Pipeline ◽

Generation Sequencing

Abstract Background More accurate and complete reference genomes have improved understanding of gene function, biology, and evolutionary mechanisms. Hybrid genome assembly approaches leverage benefits of both long, relatively error-prone reads from third-generation sequencing technologies and short, accurate reads from second-generation sequencing technologies, to produce more accurate and contiguous de novo genome assemblies in comparison to using either technology independently. In this study, we present a novel hybrid assembly pipeline that allowed for both mitogenome de novo assembly and telomere length de novo assembly of all 7 chromosomes of the model entomopathogenic fungus, Metarhizium brunneum. Results The improved assembly allowed for better ab initio gene prediction and a more BUSCO complete proteome set has been generated in comparison to the eight current NCBI reference Metarhizium spp. genomes. Remarkably, we note that including the mitogenome in ab initio gene prediction training improved overall gene prediction. The assembly was further validated by comparing contig assembly agreement across various assemblers, assessing the assembly performance of each tool. Genomic synteny and orthologous protein clusters were compared between Metarhizium brunneum and three other Hypocreales species with complete genomes, identifying core proteins, and listing orthologous protein clusters shared uniquely between the two entomopathogenic fungal species, so as to further facilitate the understanding of molecular mechanisms underpinning fungal-insect pathogenesis. Conclusions The novel assembly pipeline may be used for other haploid fungal species, facilitating the need to produce high-quality reference fungal genomes, leading to better understanding of fungal genomic evolution, chromosome structuring and gene regulation.

Download Full-text

A new oomycete metabarcoding method using the rps10 gene

10.1101/2021.09.22.460084 ◽

2021 ◽

Author(s):

Zachary S. L. Foster ◽

Felipe E Albornoz ◽

Valerie J Fieland ◽

Meredith M Larsen ◽

Frank Andrew Jones ◽

...

Keyword(s):

Environmental Samples ◽

Illumina Miseq ◽

Taxonomic Resolution ◽

Reference Database ◽

Mock Community ◽

Operational Taxonomic Units ◽

Wide Range ◽

Dna Metabarcoding ◽

Improved Methods

Oomycetes are a group of eukaryotes related to brown algae and diatoms, many of which cause diseases in plants and animals. Improved methods are needed for rapid and accurate characterization of oomycete communities using DNA metabarcoding. We have identified the mitochondrial 40S ribosomal protein S10 gene (rps10) as a locus useful for oomycete metabarcoding and provide primers predicted to amplify all oomycetes based on available reference sequences from a wide range of taxa. We evaluated its utility relative to a popular barcode, the internal transcribed spacer 1 (ITS1), by sequencing environmental samples and a mock community using Illumina MiSeq. Amplified sequence variants (ASVs) and operational taxonomic units (OTUs) were identified per community. Both the sequence and predicted taxonomy of ASVs and OTUs were compared to the known composition of the mock community. Both rps10 and ITS yielded ASVs with sequences matching 21 of the 24 species in the mock community and matching all 24 when allowing for a 1 bp difference. Taxonomic classifications of ASVs included 23 members of the mock community for rps10 and 17 for ITS1. Sequencing results for the environmental samples suggest the proposed rps10 locus results in substantially less amplification of non-target organisms than the ITS1 method. The amplified rps10 region also has higher taxonomic resolution than ITS1, allowing for greater discrimination of closely related species. We present a new website with a searchable rps10 reference database for species identification and all protocols needed for oomycete metabarcoding. The rps10 barcode and methods described herein provide an effective tool for metabarcoding oomycetes using short-read sequencing.

Download Full-text

Telomere length de novo assembly of all 7 chromosomes and mitogenome sequencing of the model entomopathogenic fungus, Metarhizium brunneum, by means of a novel assembly pipeline

10.21203/rs.3.rs-60098/v3 ◽

2020 ◽

Author(s):

Zack Saud ◽

Alexandra M. Kortsinoglou ◽

Vassili N. Kouvelis ◽

Tariq M. Butt

Keyword(s):

De Novo Assembly ◽

De Novo ◽

Gene Prediction ◽

Fungal Species ◽

Orthologous Protein ◽

Metarhizium Brunneum ◽

Sequencing Technologies ◽

Protein Clusters ◽

Assembly Pipeline ◽

Generation Sequencing

Abstract Background More accurate and complete reference genomes have improved understanding of gene function, biology, and evolutionary mechanisms. Hybrid genome assembly approaches leverage benefits of both long, relatively error-prone reads from third-generation sequencing technologies and short, accurate reads from second-generation sequencing technologies, to produce more accurate and contiguous de novo genome assemblies in comparison to using either technology independently. In this study, we present a novel hybrid assembly pipeline that allowed for both mitogenome de novo assembly and telomere length de novo assembly of all 7 chromosomes of the model entomopathogenic fungus, Metarhizium brunneum . Results The improved assembly allowed for better ab initio gene prediction and a more BUSCO complete proteome set has been generated in comparison to the eight current NCBI reference Metarhizium spp. genomes. Remarkably, we note that including the mitogenome in ab initio gene prediction training improved overall gene prediction. The assembly was further validated by comparing contig assembly agreement across various assemblers, assessing the assembly performance of each tool. Genomic synteny and orthologous protein clusters were compared between Metarhizium brunneum and three other Hypocreales species with complete genomes, identifying core proteins, and listing orthologous protein clusters shared uniquely between the two entomopathogenic fungal species, so as to further facilitate the understanding of molecular mechanisms underpinning fungal-insect pathogenesis. Conclusions The novel assembly pipeline may be used for other haploid fungal species, facilitating the need to produce high-quality reference fungal genomes, leading to better understanding of fungal genomic evolution, chromosome structuring and gene regulation.

Download Full-text