scholarly journals Impact of DNA sequencing and analysis methods on 16S rRNA gene bacterial community analysis in dairy products

2018 ◽  
Author(s):  
Zhengyao Xue ◽  
Mary E Kable ◽  
Maria L Marco

AbstractDNA sequencing and analysis methods were compared for 16S rRNA V4 PCR amplicon and gDNA mock communities encompassing nine bacterial species commonly found in milk and dairy products. The communities were examined using Illumina MiSeq and Ion Torrent PGM DNA sequencing methods followed by the QIIME 1 (UCLUST) and Divisive Amplicon Denoising Algorithm 2 (DADA2) data analysis pipelines including taxonomic comparisons to the Greengenes and Ribosomal Database Project (RDP) databases. Examination of the PCR amplicon mock community with these methods resulted in Operation Taxonomy Units (OTUs) and Amplicon Sequence Variants (ASVs) that ranged from a low of 13 to high of 118 and were dependent on the DNA sequencing method and read assembly step. The elevated numbers of OTUs and ASVs included assignments to spurious taxa as well as sequence variants of the nine species included in the mock community. Comparisons between the gDNA and PCR amplicon mock communities showed that combining gDNA from the different strains prior to PCR resulted in up to 8.9-fold greater numbers of spurious OTUs and ASVs. However, the DNA sequencing method and initial data assembly steps conferred the largest effects on predictions of bacterial diversity, independent of the mock community type (PCR amplicon or gDNA; Bray-Curtis R2 = 0.88 and weighted Unifrac, R2 = 0.32). Overall, DNA sequencing performed with the Ion Torrent PGM and analyzed with DADA2 and the Greengenes database resulted in the most accurate predictions of the mock community phylogeny, taxonomy, and diversity.ImportanceValidated methods are urgently needed to improve DNA-sequence based assessments of complex bacterial communities. In this study, we used 16S rRNA PCR amplicon and gDNA mock community standards, consisting of nine, dairy-associated bacterial species, to evaluate the most commonly applied 16S rRNA marker gene DNA sequencing and analysis platforms used in evaluating dairy and other bacterial habitats. Our results show that bacterial metataxonomic assessments are largely dependent on the DNA sequencing platform and read curation method used. DADA2 improved sequence annotation compared with QIIME 1, and when combined with the Ion Torrent PGM DNA sequencing platform and the Greengenes database for taxonomic assignment, the most accurate representation of the dairy mock community standards was reached. This approach will be useful for validating sample collection and DNA extraction methods and ultimately investigating bacterial population dynamics in milk and dairy-associated environments.

mSphere ◽  
2018 ◽  
Vol 3 (5) ◽  
Author(s):  
Zhengyao Xue ◽  
Mary E. Kable ◽  
Maria L. Marco

ABSTRACT DNA sequencing and analysis methods were compared for 16S rRNA V4 PCR amplicon and genomic DNA (gDNA) mock communities encompassing nine bacterial species commonly found in milk and dairy products. The two communities comprised strain-specific DNA that was pooled before (gDNA) or after (PCR amplicon) the PCR step. The communities were sequenced on the Illumina MiSeq and Ion Torrent PGM platforms and then analyzed using the QIIME 1 (UCLUST) and Divisive Amplicon Denoising Algorithm 2 (DADA2) analysis pipelines with taxonomic comparisons to the Greengenes and Ribosomal Database Project (RDP) databases. Examination of the PCR amplicon mock community with these methods resulted in operational taxonomic units (OTUs) and amplicon sequence variants (ASVs) that ranged from 13 to 118 and were dependent on the DNA sequencing method and read assembly steps. The additional 4 to 109 OTUs/ASVs (from 9 OTUs/ASVs) included assignments to spurious taxa and sequence variants of the 9 species included in the mock community. Comparisons between the gDNA and PCR amplicon mock communities showed that combining gDNAs from the different strains prior to PCR resulted in up to 8.9-fold greater numbers of spurious OTUs/ASVs. However, the DNA sequencing method and paired-end read assembly steps conferred the largest effects on predictions of bacterial diversity, with effect sizes of 0.88 (Bray-Curtis) and 0.32 (weighted Unifrac), independent of the mock community type. Overall, DNA sequencing performed with the Ion Torrent PGM and analyzed with DADA2 and the Greengenes database resulted in the most accurate predictions of the mock community phylogeny, taxonomy, and diversity. IMPORTANCE Validated methods are urgently needed to improve DNA sequence-based assessments of complex bacterial communities. In this study, we used 16S rRNA PCR amplicon and gDNA mock community standards, consisting of nine, dairy-associated bacterial species, to evaluate the most commonly applied 16S rRNA marker gene DNA sequencing and analysis platforms used in evaluating dairy and other bacterial habitats. Our results show that bacterial metataxonomic assessments are largely dependent on the DNA sequencing platform and read curation method used. DADA2 improved sequence annotation compared with QIIME 1, and when combined with the Ion Torrent PGM DNA sequencing platform and the Greengenes database for taxonomic assignment, the most accurate representation of the dairy mock community standards was reached. This approach will be useful for validating sample collection and DNA extraction methods and ultimately investigating bacterial population dynamics in milk- and dairy-associated environments.


2018 ◽  
Author(s):  
Thomas W. A. Braukmann ◽  
Natalia V. Ivanova ◽  
Sean W. J. Prosser ◽  
Vasco Elbrecht ◽  
Dirk Steinke ◽  
...  

AbstractDNA metabarcoding is an attractive approach for monitoring biodiversity. However, it is subject to biases that often impede detection of all species in a sample. In particular, the proportion of sequences recovered from each species depends on its biomass, mitome copy number, and primer set employed for PCR. To examine these variables, we constructed a mock community of terrestrial arthropods comprised of 374 BINs, a species proxy. We used this community to examine how species recovery was impacted when amplicon pools were constructed in four ways. The first two protocols involved the construction of bulk DNA extracts from different body partitions (Bulk Abdomen, Bulk Leg). The other protocols involved the production of DNA extracts from single legs which were then merged prior to PCR (Composite Leg) or PCR-amplified separately (Single Leg) and then pooled. The amplicon generated by these four treatments were then sequenced on three platforms (Illumina MiSeq, Ion Torrent PGM and Ion Torrent S5). The choice of sequencing platform did not substantially influence species recovery, other variables did. As expected, the best recovery was obtained from the Single Leg treatment, but the Bulk Abdomen produced a more uniform read abundance than the Bulk Leg or Composite Leg samples. Primer choice also influenced species recovery. Our results reveal how variation in protocols can have substantive impacts on perceived diversity unless sequencing coverage is sufficient to reach an asymptote. Although metabarcoding is a powerful approach, further optimization of analytical protocols is crucial to obtain reproducible results and increase its cost-effectiveness.


2013 ◽  
Author(s):  
Jose L. Costa ◽  
Arjen Mensenkamp ◽  
Alain Rico ◽  
Guoying Liu ◽  
Rosella Petraroli ◽  
...  

Blood ◽  
2014 ◽  
Vol 124 (21) ◽  
pp. 5634-5634
Author(s):  
Sozan Karim

Abstract Chronic lymphocytic leukaemia (CLL) is the most common adult leukaemia in the Western world. The disease is characterised by a great variability in clinical course and response to treatment. Acquired somatic mutations in genes related to multiple signal pathways have recently emerged as important factors contributing to this heterogeneity. These mutations accumulate with disease progression or development of therapy resistance. However, with conventional methods they are very difficult to be detected in a single test, particularly when the size of the affected cell population is small. Therefore, there is a clinical need for a rapid, reliable, affordable and highly sensitive sequencing method to monitor these mutations from an early stage of this disease. Herein we developed an ultra-deep next generation sequencing approach based on Ion Torrent PGM to sequence a total of 246 exons of 15 genes including TP53, ATM, BIRC3, NOTCH1, SF3B1, MYD88, LRP1B, SAMHD1, FBXW7, POT1, HISTIH1E, XPO1, CHD2, PCLO and ZFPM2. These genes were selected because mutations in them have been reported to involve at least 5% of CLL patients and associate with poor outcome of CLL. Serial CLL samples were collected at multiple time points from diagnosis to disease progression/drug resistance from a cohort of 33 patients. Initially, target DNA in samples taken at advanced disease stages, without or with mutations in TP53 as detected and quantified by FASAY assay and Sanger sequencing, was enriched and amplified using Agilent HaloPlex probes, with 4604 amplicons for a sequenceable size of 135.91 Kbp of each sample. On each Ion 318 Chip (1 GB output), barcoded DNA enrichment preparations from 4 patients were sequenced by the PGM. For confirmation, the same patient samples were tested repeatedly in an independent experiment starting from DNA preparation. The sequence data were aligned to a human reference genome (Hg19); variants were called by the Torrent Variant Caller (v4.0-6) and visualised with the IGV. Results showed that the target coverage was 99.999% and an average coverage depth 3941x. Importantly, all of the known TP53 mutations were repeatedly detected at expected frequencies. In addition, extra mutations at lower levels (5% - 20% alleles) in multiple genes, including TP53 and SF3B1, were detected. In a sensitivity test of this method, DNA samples from 5 cases, each with a known % of mutation in TP53measured by FASAY assay and Sanger sequencing, were pooled together and then mixed with a wild-type DNA sample to serially dilute these mutations to 20% - 0.2% alleles before target enrichment. With an average coverage depth of 1843x (range: 1610 - 2187) for these 5 target locations, each mutation was readily detected at 20%, 5% and 1%, with the average quality scores being 1497.3, 61.2 and 29.4, respectively. However, only 3 out the 5 mutations could be detected at the level of 0.2%. Taken together, we have successfully developed a sensitive next generation sequencing method for detecting mutations in a CLL gene panel. This allows us to monitor multiple mutations affecting as low as 1% of alleles in CLL samples throughout the disease course and possibly discover clinically useful biomarker(s) for CLL progression and resistance to therapy. Disclosures No relevant conflicts of interest to declare.


Author(s):  
Patrick D Schloss ◽  
Matthew L Jenior ◽  
Charles C. Koumpouras ◽  
Sarah L Westcott ◽  
Sarah K Highlander

Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina's MiSeq, have allowed researchers to obtain millions of high quality, but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3-V5, V1-V3, V1-V5, V1-V6, and V1-V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1-V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina's MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting.


2012 ◽  
Vol 91 (1) ◽  
pp. 80-88 ◽  
Author(s):  
Andrew S. Whiteley ◽  
Sasha Jenkins ◽  
Ian Waite ◽  
Nina Kresoje ◽  
Hugh Payne ◽  
...  

2016 ◽  
Author(s):  
Patrick D Schloss ◽  
Matthew L Jenior ◽  
Charles C. Koumpouras ◽  
Sarah L Westcott ◽  
Sarah K Highlander

Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina's MiSeq, have allowed researchers to obtain millions of high quality, but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3-V5, V1-V3, V1-V5, V1-V6, and V1-V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1-V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina's MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting.


2014 ◽  
Vol 7 (12) ◽  
Author(s):  
Adonney AO Veras Pablo HCG de Sa ◽  
Kenny C Pinheiro Diego Assis ◽  
das Gracas Rafael ◽  
Azevedo Barauna Maria Paula ◽  
Cruz Schneider Vasco Azevedo ◽  
...  

2019 ◽  
Author(s):  
Rachel L. Marine ◽  
Laura C. Magaña ◽  
Christina J. Castro ◽  
Kun Zhao ◽  
Anna M. Montmayeur ◽  
...  

ABSTRACTNext-generation sequencing is a powerful tool for virological surveillance. While Illumina® and Ion Torrent® sequencing platforms are used extensively for generating viral RNA genome sequences, there is limited data comparing different platforms. We evaluated the Illumina MiSeq, Ion Torrent PGM and Ion Torrent S5 platforms using a panel of sixteen specimens containing picornaviruses and human caliciviruses (noroviruses and sapoviruses). The specimens were processed, using combinations of three library preparation and five sequencing kits, to assess the quality and completeness of assembled viral genomes, and an estimation of cost per sample to generate the data was calculated. The choice of library preparation kit and sequencing platform was found to impact the breadth of genome coverage and accuracy of consensus viral genomes. The Ion Torrent S5 outperformed the older Ion Torrent PGM platform in data quality and cost, and generated the highest proportion of reads for enterovirus D68 samples. However, indels at homopolymer regions impacted the accuracy of consensus genome sequences. For lower throughput sequencing runs (i.e., Ion Torrent 510 or Illumina MiSeq Nano V2), the cost per sample was lower on the MiSeq platform, whereas with higher throughput runs (Ion Torrent 530 or Illumina MiSeq V2) the cost per sample was comparable. These findings suggest that the Ion Torrent S5 and Illumina MiSeq platforms are both viable options for genomic sequencing of RNA viruses, each with specific advantages and tradeoffs.


Sign in / Sign up

Export Citation Format

Share Document