Taxon disappearance from microbiome analysis indicates need for mock communities as a standard in every sequencing run

Mapping Intimacies ◽

10.1101/206219 ◽

2017 ◽

Cited By ~ 3

Author(s):

Yi-Chun Yeh ◽

David M. Needham ◽

Ella T. Sieradzki ◽

Jed A. Fuhrman

Keyword(s):

Method Development ◽

Pcr Amplification ◽

Unknown Origin ◽

Community Analysis ◽

Sequencing Analysis ◽

Mock Community ◽

Microbiome Analysis ◽

Biological Studies ◽

Sequencing Platforms ◽

Mock Communities

AbstractMock communities have been used in microbiome method development to help estimate biases introduced in PCR amplification, sequencing, and to optimize pipeline outputs. Nevertheless, the necessity of routine mock community analysis beyond initial method development is rarely, if ever, considered. Here we report that our routine use of mock communities as internal standards allowed us to discover highly aberrant and strong biases in the relative proportions of multiple taxa in a single Illumina HiSeqPE250 run. In this run, an important archaeal taxon virtually disappeared from all samples, and other mock community taxa showed >2-fold high or low abundance, whereas a rerun of those identical amplicons (from the same reaction tubes) on a different date yielded “normal” results. Although obvious from the strange mock community results, due to natural variation of microbiomes at our site, we easily could have missed the problem had we not used the mock communities. The “normal” results were validated over 4 MiSeqPE300 runs and 3 HiSeqPE250 runs, and run-to-run variation was usually low (Bray-Curtis distance was 0.12±0.04). While validating these “normal” results, we also discovered some mock microbial taxa had relatively modest, but consistent, differences between sequencing platforms. We suggest that using mock communities in every sequencing run is essential to distinguish potentially serious aberrations from natural variations. Such mock communities should have more than just a few members and ideally at least partly represent the samples being analyzed, to detect problems that show up only in some taxa, as we observed.ImportanceDespite the routine use of standards and blanks in virtually all chemical or physical assays and most biological studies (a kind of “control”), microbiome analysis has traditionally lacked such standards. Here we show that unexpected problems of unknown origin can occur in such sequencing runs, and yield completely incorrect results that would not necessarily be detected without the use of standards. Assuming that the microbiome sequencing analysis works properly every time risks serious errors that can be avoided by the use of suitable mock communities.

Download Full-text

Taxon Disappearance from Microbiome Analysis Reinforces the Value of Mock Communities as a Standard in Every Sequencing Run

mSystems ◽

10.1128/msystems.00023-18 ◽

2018 ◽

Vol 3 (3) ◽

Cited By ~ 21

Author(s):

Yi-Chun Yeh ◽

David M. Needham ◽

Ella T. Sieradzki ◽

Jed A. Fuhrman

Keyword(s):

Method Development ◽

Pcr Amplification ◽

Unknown Origin ◽

Community Analysis ◽

Sequencing Analysis ◽

Mock Community ◽

Microbiome Analysis ◽

Biological Studies ◽

Sequencing Platforms ◽

Mock Communities

ABSTRACT Mock communities have been used in microbiome method development to help estimate biases introduced in PCR amplification and sequencing and to optimize pipeline outputs. Nevertheless, the strong value of routine mock community analysis beyond initial method development is rarely, if ever, considered. Here we report that our routine use of mock communities as internal standards allowed us to discover highly aberrant and strong biases in the relative proportions of multiple taxa in a single Illumina HiSeqPE250 run. In this run, an important archaeal taxon virtually disappeared from all samples, and other mock community taxa showed >2-fold high or low abundance, whereas a rerun of those identical amplicons (from the same reaction tubes) on a different date yielded “normal” results. Although obvious from the strange mock community results, we could have easily missed the problem had we not used the mock communities because of natural variation of microbiomes at our site. The “normal” results were validated over four MiSeqPE300 runs and three HiSeqPE250 runs, and run-to-run variation was usually low. While validating these “normal” results, we also discovered that some mock microbial taxa had relatively modest, but consistent, differences between sequencing platforms. We strongly advise the use of mock communities in every sequencing run to distinguish potentially serious aberrations from natural variations. The mock communities should have more than just a few members and ideally at least partly represent the samples being analyzed to detect problems that show up only in some taxa and also to help validate clustering. IMPORTANCE Despite the routine use of standards and blanks in virtually all chemical or physical assays and most biological studies (a kind of “control”), microbiome analysis has traditionally lacked such standards. Here we show that unexpected problems of unknown origin can occur in such sequencing runs and yield completely incorrect results that would not necessarily be detected without the use of standards. Assuming that the microbiome sequencing analysis works properly every time risks serious errors that can be detected by the use of mock communities.

Download Full-text

Non-biological synthetic spike-in controls and the AMPtk software pipeline improve mycobiome data

PeerJ ◽

10.7717/peerj.4925 ◽

2018 ◽

Vol 6 ◽

pp. e4925 ◽

Cited By ~ 57

Author(s):

Jonathan M. Palmer ◽

Michelle A. Jusino ◽

Mark T. Banik ◽

Daniel L. Lindner

Keyword(s):

Amplicon Sequencing ◽

Variable Length ◽

Its Sequences ◽

Synthetic Control ◽

Mock Community ◽

Software Pipeline ◽

Initial Polymerase Chain Reaction ◽

Polymerase Chain ◽

Sequencing Platforms ◽

Mock Communities

High-throughput amplicon sequencing (HTAS) of conserved DNA regions is a powerful technique to characterize microbial communities. Recently, spike-in mock communities have been used to measure accuracy of sequencing platforms and data analysis pipelines. To assess the ability of sequencing platforms and data processing pipelines using fungal internal transcribed spacer (ITS) amplicons, we created two ITS spike-in control mock communities composed of cloned DNA in plasmids: a biological mock community, consisting of ITS sequences from fungal taxa, and a synthetic mock community (SynMock), consisting of non-biological ITS-like sequences. Using these spike-in controls we show that: (1) a non-biological synthetic control (e.g., SynMock) is the best solution for parameterizing bioinformatics pipelines, (2) pre-clustering steps for variable length amplicons are critically important, (3) a major source of bias is attributed to the initial polymerase chain reaction (PCR) and thus HTAS read abundances are typically not representative of starting values. We developed AMPtk, a versatile software solution equipped to deal with variable length amplicons and quality filter HTAS data based on spike-in controls. While we describe herein a non-biological SynMock community for ITS sequences, the concept and AMPtk software can be widely applied to any HTAS dataset to improve data quality.

Download Full-text

Metatranscriptomics provides closer diversity and composition estimates with morphology than PCR-based methods: a zooplankton mock community case study

10.22541/au.160683304.41683264/v1 ◽

2020 ◽

Author(s):

Mark Louie Lopez ◽

Ya-Ying Lin ◽

Mitsuhide Sato ◽

Fuh-Kwo Shiah ◽

Chih-hao Hsieh ◽

...

Keyword(s):

Species Richness ◽

Species Diversity ◽

Massively Parallel Sequencing ◽

Pcr Amplification ◽

Morphological Data ◽

Sequence Variant ◽

Mock Community ◽

Field Samples ◽

Diversity Estimates ◽

Mock Communities

Studying complex metazoan communities requires taxonomic expertise and laborious work if done using the traditional morphological approach. Nowadays, the popular use of molecular-based methods accompanied by massively parallel sequencing (MPS) provides rapid and higher resolution diversity analyses. However, diversity estimates derived from the molecular-based approach can be biased by the co-detection of environmental DNA (eDNA), pseudogene contamination, and PCR amplification biases. Here, we constructed microcrustacean zooplankton mock communities to compare species diversity and composition estimates from PCR-based methods using genomic (gDNA) and complementary DNA (cDNA), metatranscriptomic transcripts, and morphology data. Mock community analyses show that gDNA mitochondrial cytochrome c oxidase I (mtCOI) amplicons inflate species richness due to environmental and nontarget species sequence contamination. Significantly higher amplicon sequence variant (ASV) and nucleotide diversity in gDNA amplicons than cDNA indicated the presence of putative pseudogenes. Last, PCR-based methods failed to detect the most abundant species in mock communities due to priming site mismatch. Overall, metatranscriptomic transcripts provided estimates of species richness and composition that closely resembled morphological data. The use of metatranscriptomic transcripts was further tested in field samples. The results showed that it could provide consistent species diversity estimates among biological and technical replicates while allowing monitoring of the zooplankton temporal species composition changes using different mitochondrial markers. These findings show that community characterization based on metatranscriptomic transcripts reflects the actual community more than PCR-based approaches.

Download Full-text

Efficient and stable metabarcoding sequencing data using a DNBSEQ-G400 sequencer validated by comprehensive community analyses

Gigabyte ◽

10.46471/gigabyte.16 ◽

2021 ◽

Vol 2021 ◽

pp. 1-15

Author(s):

Xiaohuan Sun ◽

Yue-Hua Hu ◽

Jingjing Wang ◽

Chao Fang ◽

Jiguang Li ◽

...

Keyword(s):

Microbial Communities ◽

High Performance ◽

Pcr Amplification ◽

Sequencing Data ◽

Noticeable Effect ◽

Sequencing Platform ◽

Taxonomic Markers ◽

Sequencing Platforms ◽

Mock Communities

Metabarcoding is a widely used method for fast characterization of microbial communities in complex environmental samples. However, the selction of sequencing platform can have a noticeable effect on the estimated community composition. Here, we evaluated the metabarcoding performance of a DNBSEQ-G400 sequencer developed by MGI Tech using 16S and internal transcribed spacer (ITS) markers to investigate bacterial and fungal mock communities, as well as the ITS2 marker to investigate the fungal community of 1144 soil samples, with additional technical replicates. We show that highly accurate sequencing of bacterial and fungal communities is achievable using DNBSEQ-G400. Measures of diversity and correlation from soil metabarcoding showed that the results correlated highly with those of different machines of the same model, as well as between different sequencing modes (single-end 400 bp and paired-end 200 bp). Moderate, but significant differences were observed between results produced with different sequencing platforms (DNBSEQ-G400 and MiSeq); however, the highest differences can be caused by selecting different primer pairs for PCR amplification of taxonomic markers. These differences suggested that care is needed while jointly analyzing metabarcoding data from differenet experiments. This study demonstrated the high performance and accuracy of DNBSEQ-G400 for short-read metabarcoding of microbial communities. Our study also produced datasets to allow further investigation of microbial diversity.

Download Full-text

mockrobiota: a public resource for microbiome bioinformatics benchmarking

10.7287/peerj.preprints.2065v1 ◽

2016 ◽

Author(s):

Nicholas A Bokulich ◽

Jai Ram Rideout ◽

William G Mercurio ◽

Benjamin Wolfe ◽

Corinne F Maurice ◽

...

Keyword(s):

Sequence Data ◽

Marker Gene ◽

Community Analysis ◽

Microbial Community Analysis ◽

Community Members ◽

Mock Community ◽

Public Resource ◽

Source Data ◽

Community Data ◽

Mock Communities

Mock communities are an important tool for validating, optimizing, and comparing bioinformatics methods for microbial community analysis. We present mockrobiota, a public resource for sharing, validating, and documenting mock community data resources, available at https://github.com/caporaso-lab/mockrobiota. The materials contained in mockrobiota include dataset and sample metadata, expected composition data, which are annotated based on one or more reference taxonomies, links to raw data (e.g., raw sequence data) for each mock community dataset, and optional reference sequences for mock community members. mockrobiota does not supply physical sample materials directly, but the dataset metadata included for each mock community indicate whether physical sample materials are available (and associated contact information). At the time of this writing, mockrobiota contains 11 mock community datasets with known species compositions (including bacterial, archaeal, and eukaryotic mock communities), analyzed by high-throughput marker-gene sequencing. The availability of standard, public mock community data will facilitate ongoing methods optimizations; comparisons across studies that share source data; greater transparency and access; and eliminate redundancy. This dynamic resource is intended to expand and evolve to meet the changing needs of the ‘omics community.

Download Full-text

Non-biological synthetic spike-in controls and the AMPtk software pipeline improve mycobiome data

10.1101/213470 ◽

2017 ◽

Cited By ~ 2

Author(s):

Jonathan M Palmer ◽

Michelle A Jusino ◽

Mark T Banik ◽

Daniel L Lindner

Keyword(s):

Data Analysis ◽

Amplicon Sequencing ◽

Variable Length ◽

Its Sequences ◽

Synthetic Control ◽

Mock Community ◽

Software Pipeline ◽

Sequencing Platforms ◽

Fungal Its ◽

Mock Communities

High throughput amplicon sequencing (HTAS) of conserved DNA regions is a powerful technique to characterize microbial communities. Recently, spike-in mock communities have been used to measure accuracy of sequencing platforms and data analysis pipelines. To assess the ability of sequencing platforms and data processing pipelines using fungal ITS amplicons, we created two ITS spike-in control mock communities composed of cloned DNA in plasmids: a biological mock community (BioMock), consisting of ITS sequences from fungal taxa, and a synthetic mock community (SynMock), consisting of non-biological ITS-like sequences. Using these spike-in controls we show that: 1) a non-biological synthetic control (e.g., SynMock) is the best solution for parameterizing bioinformatics pipelines, 2) pre-clustering steps for variable length amplicons are critically important, 3) a major source of bias is attributed to initial PCR reactions and thus HTAS read abundances are typically not representative of starting values. We developed AMPtk, a versatile software solution equipped to deal with variable length amplicons and quality filter HTAS data based on spike-in controls. While we describe herein a non-biological synthetic mock community for ITS sequences, the concept and AMPtk software can be widely applied to any HTAS dataset to improve data quality.

Download Full-text

mockrobiota: a public resource for microbiome bioinformatics benchmarking

10.7287/peerj.preprints.2065 ◽

2016 ◽

Author(s):

Nicholas A Bokulich ◽

Jai Ram Rideout ◽

William G Mercurio ◽

Benjamin Wolfe ◽

Corinne F Maurice ◽

...

Keyword(s):

Sequence Data ◽

Marker Gene ◽

Community Analysis ◽

Microbial Community Analysis ◽

Community Members ◽

Mock Community ◽

Public Resource ◽

Source Data ◽

Community Data ◽

Mock Communities

Download Full-text

Regulation of Sporangium Formation by BldD in the Rare Actinomycete Actinoplanes missouriensis

Journal of Bacteriology ◽

10.1128/jb.00840-16 ◽

2017 ◽

Vol 199 (12) ◽

Cited By ~ 15

Author(s):

Yoshihiro Mouri ◽

Kenji Konishi ◽

Azusa Fujita ◽

Takeaki Tezuka ◽

Yasuo Ohnishi

Keyword(s):

Vegetative Growth ◽

Streptomyces Coelicolor ◽

Morphological Differentiation ◽

Transcriptional Regulator ◽

Saccharopolyspora Erythraea ◽

Transcriptional Analysis ◽

Morphological Development ◽

Sequencing Analysis ◽

Content Type ◽

Biological Studies

ABSTRACT The rare actinomycete Actinoplanes missouriensis forms sporangia, including hundreds of flagellated spores that start swimming as zoospores after their release. Under conditions suitable for vegetative growth, zoospores stop swimming and germinate. A comparative proteome analysis between zoospores and germinating cells identified 15 proteins that were produced in larger amounts in germinating cells. They include an orthologue of BldD (herein named AmBldD [BldD of A. missouriensis]), which is a transcriptional regulator involved in morphological development and secondary metabolism in Streptomyces. AmBldD was detected in mycelia during vegetative growth but was barely detected in mycelia during the sporangium-forming phase, in spite of the constant transcription of AmbldD throughout growth. An AmbldD mutant started to form sporangia much earlier than the wild-type strain, and the resulting sporangia were morphologically abnormal. Recombinant AmBldD bound a palindromic sequence, the AmBldD box, located upstream from AmbldD. 3′,5′-Cyclic di-GMP significantly enhanced the in vitro DNA-binding ability of AmBldD. A chromatin immunoprecipitation-sequencing analysis and an in silico search for AmBldD boxes revealed that AmBldD bound 346 genomic loci that contained the 19-bp inverted repeat 5′-NN(G/A)TNACN(C/G)N(G/C)NGTNA(C/T)NN-3′ as the consensus AmBldD-binding sequence. The transcriptional analysis of 27 selected AmBldD target gene candidates indicated that AmBldD should repress 12 of the 27 genes, including bldM, ssgB, whiD, ddbA, and wblA orthologues. These genes are involved in morphological development in Streptomyces coelicolor A3(2). Thus, AmBldD is a global transcriptional regulator that seems to repress the transcription of tens of genes during vegetative growth, some of which are likely to be required for sporangium formation. IMPORTANCE The rare actinomycete Actinoplanes missouriensis undergoes complex morphological differentiation, including sporangium formation. However, almost no molecular biological studies have been conducted on this bacterium. BldD is a key global regulator involved in the morphological development of streptomycetes. BldD orthologues are highly conserved among sporulating actinomycetes, but no BldD orthologues, except one in Saccharopolyspora erythraea, have been studied outside the streptomycetes. Here, it was revealed that the BldD orthologue AmBldD is essential for normal developmental processes in A. missouriensis. The AmBldD regulon seems to be different from the BldD regulon in Streptomyces coelicolor A3(2), but they share four genes that are involved in morphological differentiation in S. coelicolor A3(2).

Download Full-text

Identification and functional analysis of novel SLC25A19 variants causing thiamine metabolism dysfunction syndrome 4

Orphanet Journal of Rare Diseases ◽

10.1186/s13023-021-02028-4 ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Yuanying Chen ◽

Boliang Fang ◽

Xuyun Hu ◽

Ruolan Guo ◽

Jun Guo ◽

...

Keyword(s):

Exome Sequencing ◽

Fever Of Unknown Origin ◽

Clinical Decision Making ◽

Unknown Origin ◽

Thiamine Pyrophosphate ◽

Molecular Evidence ◽

Sequencing Analysis ◽

Acute Necrotizing Encephalopathy ◽

Functional Studies ◽

Thiamine Metabolism

Abstract Background Thiamine metabolism dysfunction syndrome 4 (THMD4, OMIM #613710) is an autosomal recessive inherited disease caused by the deficiency of SLC25A19 that encodes the mitochondrial thiamine pyrophosphate (TPP) transporter. This disorder is characterized by bilateral striatal degradation and progressive polyneuropathy with the onset of fever of unknown origin. The limited number of reported cases and lack of functional annotation of related gene variants continue to limit diagnosis. Results We report three cases of encephalopathy from two unrelated pedigrees with basal ganglia signal changes after fever of unknown origin. To distinguish this from other types of encephalopathy, such as acute necrotizing encephalopathy, exome sequencing was performed, and four novel heterozygous variations, namely, c.169G>A (p.Ala57Thr), c.383C>T (p.Ala128Val), c.76G>A (p.Gly26Arg), and c.745T>A (p.Phe249Ile), were identified in SLC25A19. All variants were confirmed using Sanger sequencing. To determine the pathogenicity of these variants, functional studies were performed. We found that mitochondrial TPP levels were significantly decreased in the presence of SLC25A19 variants, indicating that TPP transport activities of mutated SLC25A19 proteins were impaired. Thus, combining clinical phenotype, genetic analysis, and functional studies, these variants were deemed as likely pathogenic. Conclusions Exome sequencing analysis enables molecular diagnosis as well as provides potential etiology. Further studies will enable the elucidation of SLC25A19 protein function. Our investigation supplied key molecular evidence for the precise diagnosis of and clinical decision-making for a rare disease.

Download Full-text

Differentiation and Characterization by Molecular Techniques of Bacillus cereus Group Isolates from Poto Poto and Dégué, Two Traditional Cereal-Based Fermented Foods of Burkina Faso and Republic of Congo

Journal of Food Protection ◽

10.4315/0362-028x-70.5.1165 ◽

2007 ◽

Vol 70 (5) ◽

pp. 1165-1173 ◽

Cited By ~ 16

Author(s):

HIKMATE ABRIOUEL ◽

NABIL BEN OMAR ◽

ROSARIO LUCAS LÓPEZ ◽

MAGDALENA MARTÍNEZ CAÑAMERO ◽

ELENA ORTEGA ◽

...

Keyword(s):

Melting Curve ◽

Pcr Amplification ◽

Curve Analysis ◽

Melting Curve Analysis ◽

Toxin Gene ◽

Fermented Foods ◽

Sequencing Analysis ◽

16S Rdna Sequencing ◽

Bacillus Cereus Group ◽

Rdna Sequencing

Poto poto (a maize sourdough) and dégué (a pearl millet–based food) are two traditional African fermented foods. The molecular biology of toxigenic and pathogenic bacteria found in those foods is largely unknown. The purpose of this study was to study the phylogenetic relatedness and toxigenic potential of 26 Bacillus cereus group isolates from these traditional fermented foods. The relatedness of the isolates was evaluated with repetitive element sequence-based PCR (REP-PCR) and 16S rDNA sequencing analysis. A multiplex real-time PCR assay targeting the lef and capC genes of B. anthracis pXO1 and pXO2 plasmids and the sspE chromosomal gene of B. cereus and B. anthracis also was carried out. Melting curve analysis of the sspE amplification product was used to differentiate B. cereus from B. anthracis, and the presence of the B. cereus enterotoxin genes was determined with PCR amplification. Isolates had 15 different REP-PCR profiles, according to which they could be clustered into four groups. 16S rDNA sequencing analysis identified 23 isolates as B. cereus or B. anthracis and three isolates as B. cereus or Bacillus sp. Multiplex real-time PCR amplification indicated the absence of the lef and capC genes of B. anthracis pXO1 and pXO2 plasmids, and melting curve analysis revealed amplification of the 71-bp sspE product typical of B. cereus in all isolates instead of the 188-bp amplicon of B. anthracis, confirming the identity of these isolates as B. cereus. Four isolates had amylolytic activity. All isolates had lecithinase activity and beta-hemolytic activity. Enterotoxin production was detected in two isolates. The emetic toxin gene was not detected in any isolate. The nheB toxin gene was detected in 19 isolates by PCR amplification; one of these isolates also contained the hblD (L1) gene. The cytotoxin K cytK-1 gene was not detected, but the cytK-2 gene was clearly detected in six isolates.

Download Full-text