Are multiplexed metabarcoding panels comparable to individual marker gene library preparations?

Molecular characterization and phylogenetic assessment of a few Dioscorea (Dioscoreaceae) species of North-East India

Indian Journal of Genetics and Plant Breeding (The) ◽

10.31742/ijgpb.79.1.11 ◽

2019 ◽

Vol 79 (01) ◽

Author(s):

Nilofer Sheikh ◽

Yogendra Kumar ◽

A. K. Misra

Keyword(s):

Molecular Characterization ◽

Distinct Species ◽

18S Rrna Gene ◽

Marker Genes ◽

Rrna Gene ◽

Rbcl Gene ◽

Eastern Region ◽

Gene Sequences ◽

Sequence Characterization ◽

North East

Dioscorea spp. or yam are consumed by the indigenous peoples of North-Eastern region of India as a substitute for potato and most of the species are also used in traditional medicine. North-East region is one of the hotspot for Dioscorea species growing in wild habitat which has not been characterized or identified. In the present study, eight morphologically distinct species of Dioscorea belonging to section Enantiophyllum, Botryosicyos and Opsophyton were subjected to molecular characterization and phylogenetic assessment using three marker genes (18S rDNA, matK and rbcL). The results of sequence characterization of the these genes revealed that 18S rRNA gene was highly conserved than matK and rbcL gene sequences and hence 18Sr RNA gene can be used as better candidates for species delimitation. The phylogenetic analysis of the combined molecular gene sequences also showed that the species belonging to section Enantiophyllum were monophyletic in origin.

Download Full-text

Kelpie: generating full-length ‘amplicons’ from whole-metagenome datasets

PeerJ ◽

10.7717/peerj.6174 ◽

2019 ◽

Vol 6 ◽

pp. e6174 ◽

Cited By ~ 1

Author(s):

Paul Greenfield ◽

Nai Tran-Dinh ◽

David Midgley

Keyword(s):

Marker Gene ◽

Amplicon Sequencing ◽

Genomic Region ◽

Full Length ◽

Rrna Gene ◽

Multiple Regions ◽

Metagenome Sequencing ◽

And Function ◽

Genomic Regions ◽

Metagenomic Assembly

Introduction Whole-metagenome sequencing can be a rich source of information about the structure and function of entire metagenomic communities, but getting accurate and reliable results from these datasets can be challenging. Analysis of these datasets is founded on the mapping of sequencing reads onto known genomic regions from known organisms, but short reads will often map equally well to multiple regions, and to multiple reference organisms. Assembling metagenomic datasets prior to mapping can generate much longer and more precisely mappable sequences but the presence of closely related organisms and highly conserved regions makes metagenomic assembly challenging, and some regions of particular interest can assemble poorly. One solution to these problems is to use specialised tools, such as Kelpie, that can accurately extract and assemble full-length sequences for defined genomic regions from whole-metagenome datasets. Methods Kelpie is a kMer-based tool that generates full-length amplicon-like sequences from whole-metagenome datasets. It takes a pair of primer sequences and a set of metagenomic reads, and uses a combination of kMer filtering, error correction and assembly techniques to construct sets of full-length inter-primer sequences. Results The effectiveness of Kelpie is demonstrated here through the extraction and assembly of full-length ribosomal marker gene regions, as this allows comparisons with conventional amplicon sequencing and published metagenomic benchmarks. The results show that the Kelpie-generated sequences and community profiles closely match those produced by amplicon sequencing, down to low abundance levels, and running Kelpie on the synthetic CAMI metagenomic benchmarking datasets shows similar high levels of both precision and recall. Conclusions Kelpie can be thought of as being somewhat like an in-silico PCR tool, taking a primer pair and producing the resulting ‘amplicons’ from a whole-metagenome dataset. Marker regions from the 16S rRNA gene were used here as an example because this allowed the overall accuracy of Kelpie to be evaluated through comparisons with other datasets, approaches and benchmarks. Kelpie is not limited to this application though, and can be used to extract and assemble any genomic region present in a whole metagenome dataset, as long as it is bound by a pairs of highly conserved primer sequences.

Download Full-text

PEMA: from the raw .fastq files of 16S rRNA and COI marker genes to the (M)OTU-table, a thorough metabarcoding analysis

10.1101/709113 ◽

2019 ◽

Author(s):

Haris Zafeiropoulos ◽

Ha Quoc Viet ◽

Katerina Vasileiadou ◽

Antonis Potirakis ◽

Christos Arvanitidis ◽

...

Keyword(s):

16S Rrna ◽

Programming Languages ◽

High Performance ◽

Marker Gene ◽

Environmental Dna ◽

Operational Taxonomic Unit ◽

Third Party ◽

Marker Genes ◽

New Era ◽

Efficient Performance

AbstractBackgroundEnvironmental DNA (eDNA) and metabarcoding, allow the identification of a mixture of individuals and launch a new era in bio- and eco-assessment. A number of steps are required to obtain taxonomically assigned (Molecular) Operational Taxonomic Unit ((M)OTU) tables from raw data. For most of these, a plethora of tools is available; each tool’s execution parameters need to be tailored to reflect each experiment’s idiosyncrasy. Adding to this complexity, for such analyses, the computation capacity of High Performance Computing (HPC) systems is frequently required.Software containerization technologies ease the sharing and running of software packages across operating systems; thus, they strongly facilitate pipeline development and usage. Likewise are programming languages specialized for big data pipelines, incorporating features like roll-back checkpoints and on-demand partial pipeline execution.FindingsPEMA is a containerized assembly of key metabarcoding analysis tools with a low effort in setting up, running and customizing to researchers’ needs. Based on third party tools, PEMA performs reads’ pre-processing, clustering to (M)OTUs and taxonomy assignment for 16S rRNA and COI marker gene data. Due to its simplified parameterisation and checkpoint support, PEMA allows users to explore alternative algorithms for specific steps of the pipeline without the need of a complete re-execution. PEMA was evaluated against previously published datasets and achieved comparable quality results.ConclusionsGiven its time-efficient performance and its quality results, it is suggested that PEMA can be used for accurate eDNA metabarcoding analysis, thus enhancing the applicability of next-generation biodiversity assessment studies.

Download Full-text

Genetic differentiation within the Paralia longispina (Bacillariophyta) species complex

Botany ◽

10.1139/b11-101 ◽

2012 ◽

Vol 90 (3) ◽

pp. 205-222 ◽

Cited By ~ 11

Author(s):

Michael L. MacGillivary ◽

Irena Kaczmarska

Keyword(s):

Large Subunit ◽

Morphological Diversity ◽

Its Region ◽

Small Subunit ◽

18S Rrna Gene ◽

Tropical Species ◽

Rrna Gene ◽

Rbcl Gene ◽

Extant Species ◽

Practical Implications

We report on undiscovered genetic and morphological diversity within a diatom taxon known as Paralia longispina and propose three new, extant species of Paralia — Paralia allisonii sp. nov., Paralia crawfordii sp. nov., and Paralia ehrmanii sp. nov. — obtained from subtropical and tropical coasts of the Atlantic and Pacific oceans. Comprehensive examination and hierarchical clustering of frustule characters separated these three species from each other and from P. longispina. Each species possessed one or two unique morphological diagnostic characters. The internal transcribed spacer (ITS) region and a fragment of the small subunit (18S rRNA gene) of nuclear encoded rRNA and a portion of the 5′ end of the large subunit of the ribulose bisophosphate carboxylase (rbcL) gene of the chloroplast genome were related to the morphological groupings of monoclonal isolates of P. allisonii and P. crawfordii. The ITS2 secondary structure of the cultured clones of these two species had four helices. They differed by five hemicompensatory base changes (HCBCs) and substantial changes to the distal end of helix I, the region between helix II and helix III, and nearly all of helix IV. Our results better our understanding of the distribution of subtropical and tropical species of Paralia and have practical implications for the conservation of native microbial florae and for the possibility of ship ballast in human-mediated dispersal of these diatoms.

Download Full-text

A comparison of approaches to scaffolding multiple regions along the 16S rRNA gene for improved resolution

10.1101/2021.03.23.436606 ◽

2021 ◽

Author(s):

Justine W Debelius ◽

Michael Robeson ◽

Luisa W. Hugerth ◽

Fredrik Boulund ◽

Weimin Ye ◽

...

Keyword(s):

16S Rrna ◽

Marker Gene ◽

Alpha Diversity ◽

Real Data ◽

Full Length ◽

Rrna Genes ◽

Taxonomic Resolution ◽

Rrna Gene ◽

Multiple Regions ◽

Tree Building

AbstractMotivationFull length, high resolution 16s rRNA marker gene sequencing has been challenging historically. Short amplicons provide high accuracy reads with widely available equipment, at the cost of taxonomic resolution. One recent proposal has been to reconstruct multiple amplicons along the full-length marker gene, however no barcode-free computationally tractable approach for this is available. To address this gap, we present Sidle (SMURF Implementation Done to acceLerate Efficiency), an implementation of the Short MUltiple Reads Framework algorithm with a novel tree building approach to reconstruct rRNA genes from individually amplified regions.ResultsUsing simulated and real data, we compared Sidle to two other approaches of leveraging multiple gene region data. We found that Sidle had the least bias in non-phylogenetic alpha diversity, feature-based measures of beta diversity, and the reconstruction of individual clades. With a curated database, Sidle also provided the most precise species-level resolution.Availability and ImplementationSidle is available under a BSD 3 license from https://github.com/jwdebelius/q2-sidle

Download Full-text

PEMA v2: addressing metabarcoding bioinformatics analysis challenges

ARPHA Conference Abstracts ◽

10.3897/aca.4.e64902 ◽

2021 ◽

Vol 4 ◽

Author(s):

Haris Zafeiropoulos ◽

Christina Pavloudi ◽

Evangelos Pafilis

Keyword(s):

High Performance ◽

Bioinformatics Analysis ◽

Marker Gene ◽

Environmental Dna ◽

Third Party ◽

Reference Database ◽

Marker Genes ◽

Specific Reference ◽

Taxonomic Assignment ◽

Internal Joint

Environmental DNA (eDNA) and metabarcoding have launched a new era in bio- and eco-assessment over the last years (Ruppert et al. 2019). The simultaneous identification, at the lowest taxonomic level possible, of a mixture of taxa from a great range of samples is now feasible; thus, the number of eDNA metabarcoding studies has increased radically (Deiner and 2017). While the experimental part of eDNA metabarcoding can be rather challenging depending on the special characteristics of the different studies, computational issues are considered to be its major bottlenecks. Among the latter, the bioinformatics analysis of metabarcoding data and especially the taxonomy assignment of the sequences are fundamental challenges. Many steps are required to obtain taxonomically assigned matrices from raw data. For most of these, a plethora of tools are available. However, each tool's execution parameters need to be tailored to reflect each experiment's idiosyncrasy; thus, tuning bioinformatics analysis has proved itself fundamental (Kamenova 2020). The computation capacity of high-performance computing systems (HPC) is frequently required for such analyses. On top of that, the non perfect completeness and correctness of the reference taxonomy databases is another important issue (Loos et al. 2020). Based on third-party tools, we have developed the Pipeline for Environmental Metabarcoding Analysis (PEMA), a HPC-centered, containerized assembly of key metabarcoding analysis tools. PEMA combines state-of-the art technologies and algorithms with an easy to get-set-use framework, allowing researchers to tune thoroughly each study thanks to roll-back checkpoints and on-demand partial pipeline execution features (Zafeiropoulos 2020). Once PEMA was released, there were two main pitfalls soon to be highlighted by users. PEMA supported 4 marker genes and was bounded by specific reference databases. In this new version of PEMA the analysis of any marker gene is now available since a new feature was added, allowing classifiers to train a user-provided reference database and use it for taxonomic assignment. Fig. 1 shows the taxonomy assignment related PEMA modules; all those out of the dashed box have been developed for this new PEMA release. As shown, the RDPClassifier has been trained with Midori reference 2 and has been added as an option, classifying not only metazoans but sequences from all taxonomic groups of Eukaryotes for the case of the COI marker gene. A PEMA documentation site is now also available. PEMA.v2 containers are available via the DockerHub and SingularityHub as well as through the Elixir Greece AAI Service. It has also been selected to be part of the LifeWatch ERIC Internal Joint Initiative for the analysis of ARMS data and soon will be available through the Tesseract VRE.

Download Full-text

Kelpie: generating full-length ‘amplicons’ from whole-metagenome datasets

10.7287/peerj.preprints.27376v1 ◽

2018 ◽

Author(s):

Paul Greenfield ◽

Nai Tran-Dinh ◽

David Midgley

Keyword(s):

Marker Gene ◽

Amplicon Sequencing ◽

Genomic Region ◽

Full Length ◽

Rrna Gene ◽

Multiple Regions ◽

Metagenome Sequencing ◽

And Function ◽

Genomic Regions ◽

Metagenomic Assembly

Introduction. Whole-metagenome sequencing can be a rich source of information about the structure and function of entire metagenomic communities, but getting accurate and reliable results from these datasets can be challenging. Analysis of these datasets is founded on the mapping of sequencing reads onto known genomic regions from known organisms, but short reads will often map equally well to multiple regions, and to multiple reference organisms. Assembling metagenomic datasets prior to mapping can generate much longer and more precisely mappable sequences but the presence of closely related organisms and highly conserved regions makes metagenomic assembly challenging, and some regions of particular interest can assemble poorly. One solution to these problems is to use specialised tools, such as Kelpie, that can accurately extract and assemble full-length sequences for defined genomic regions from whole-metagenome datasets. Methods. Kelpie is a kMer-based tool that generates full-length amplicon-like sequences from whole-metagenome datasets. It takes a pair of primer sequences and a set of metagenomic reads, and uses a combination of kMer filtering, error correction and assembly techniques to construct sets of full-length inter-primer sequences. Results. The effectiveness of Kelpie is demonstrated here through the extraction and assembly of full-length ribosomal marker gene regions, as this allows comparisons with conventional amplicon sequencing and published metagenomic benchmarks. The results show that the Kelpie-generated sequences and community profiles closely match those produced by amplicon sequencing, down to low abundance levels, and running Kelpie on the synthetic CAMI metagenomic benchmarking datasets shows similar high levels of both precision and recall. Conclusions. Kelpie can be thought of as being somewhat like an in-silico PCR tool, taking a primer pair and producing the resulting ‘amplicons’ from a whole-metagenome dataset. Marker regions from the 16S rRNA gene were used here as an example because this allowed the overall accuracy of Kelpie to be evaluated through comparisons with other datasets, approaches and benchmarks. Kelpie is not limited to this application though, and can be used to extract and assemble any genomic region present in a whole metagenome dataset, as long as it is bound by a pairs of highly conserved primer sequences.

Download Full-text

Kelpie: generating full-length ‘amplicons’ from whole-metagenome datasets

10.7287/peerj.preprints.27376 ◽

2018 ◽

Author(s):

Paul Greenfield ◽

Nai Tran-Dinh ◽

David Midgley

Keyword(s):

Marker Gene ◽

Amplicon Sequencing ◽

Genomic Region ◽

Full Length ◽

Rrna Gene ◽

Multiple Regions ◽

Metagenome Sequencing ◽

And Function ◽

Genomic Regions ◽

Metagenomic Assembly

Introduction. Whole-metagenome sequencing can be a rich source of information about the structure and function of entire metagenomic communities, but getting accurate and reliable results from these datasets can be challenging. Analysis of these datasets is founded on the mapping of sequencing reads onto known genomic regions from known organisms, but short reads will often map equally well to multiple regions, and to multiple reference organisms. Assembling metagenomic datasets prior to mapping can generate much longer and more precisely mappable sequences but the presence of closely related organisms and highly conserved regions makes metagenomic assembly challenging, and some regions of particular interest can assemble poorly. One solution to these problems is to use specialised tools, such as Kelpie, that can accurately extract and assemble full-length sequences for defined genomic regions from whole-metagenome datasets. Methods. Kelpie is a kMer-based tool that generates full-length amplicon-like sequences from whole-metagenome datasets. It takes a pair of primer sequences and a set of metagenomic reads, and uses a combination of kMer filtering, error correction and assembly techniques to construct sets of full-length inter-primer sequences. Results. The effectiveness of Kelpie is demonstrated here through the extraction and assembly of full-length ribosomal marker gene regions, as this allows comparisons with conventional amplicon sequencing and published metagenomic benchmarks. The results show that the Kelpie-generated sequences and community profiles closely match those produced by amplicon sequencing, down to low abundance levels, and running Kelpie on the synthetic CAMI metagenomic benchmarking datasets shows similar high levels of both precision and recall. Conclusions. Kelpie can be thought of as being somewhat like an in-silico PCR tool, taking a primer pair and producing the resulting ‘amplicons’ from a whole-metagenome dataset. Marker regions from the 16S rRNA gene were used here as an example because this allowed the overall accuracy of Kelpie to be evaluated through comparisons with other datasets, approaches and benchmarks. Kelpie is not limited to this application though, and can be used to extract and assemble any genomic region present in a whole metagenome dataset, as long as it is bound by a pairs of highly conserved primer sequences.

Download Full-text

Abundance and Biogeography of Picoprasinophyte Ecotypes and Other Phytoplankton in the Eastern North Pacific Ocean

Applied and Environmental Microbiology ◽

10.1128/aem.02730-15 ◽

2016 ◽

Vol 82 (6) ◽

pp. 1693-1705 ◽

Cited By ~ 33

Author(s):

Melinda P. Simmons ◽

Sebastian Sudek ◽

Adam Monier ◽

Alexander J. Limardo ◽

Valeria Jimenez ◽

...

Keyword(s):

North Pacific ◽

North Pacific Ocean ◽

Data Interpretation ◽

18S Rrna Gene ◽

Genetic Distances ◽

Environmental Data ◽

Marker Genes ◽

Rrna Gene ◽

Ecological Gradient ◽

Content Type

ABSTRACTEukaryotic algae within the picoplankton size class (≤2 μm in diameter) are important marine primary producers, but their spatial and ecological distributions are not well characterized. Here, we studied three picoeukaryotic prasinophyte genera and their cyanobacterial counterparts,ProchlorococcusandSynechococcus, during two cruises along a North Pacific transect characterized by different ecological regimes. Picoeukaryotes andSynechococcusreached maximum abundances of 1.44 × 105and 3.37 × 105cells · ml−1, respectively, in mesotrophic waters, whileProchlorococcusreached 1.95 × 105cells · ml−1in the oligotrophic ocean. Of the picoeukaryotes,Bathycoccuswas present at all stations in both cruises, reaching 21,368 ± 327 18S rRNA gene copies · ml−1.MicromonasandOstreococcusclade OI were detected only in mesotrophic and coastal waters andOstreococcusclade OII only in the oligotrophic ocean. To resolve proposedBathycoccusecotypes, we established genetic distances for 1,104 marker genes using targeted metagenomes and theBathycoccus prasinosgenome. The analysis was anchored in comparative genome analysis of threeOstreococcusspecies for which physiological and environmental data are available to facilitate data interpretation. We established that twoBathycoccusecotypes exist, named here BI (represented by coastal isolateBathycoccus prasinos) and BII. These share 82% ± 6% nucleotide identity across homologs, while theOstreococcusspp. share 75% ± 8%. We developed and applied an analysis of ecomarkers to metatranscriptomes sequenced here and published -omics data from the same region. The results indicated that theBathycoccusecotypes cooccur more often thanOstreococcusclades OI and OII do. Exploratory analyses of relative transcript abundances suggest thatBathycoccusNRT2.1 and AMT2.2 are high-affinity NO3−and low-affinity NH4+transporters, respectively, with close homologs in multiple picoprasinophytes. Additionally, in the open ocean, where dissolved iron concentrations were low (0.08 nM), there appeared to be a shift to the use of nickel superoxide dismutases (SODs) from Mn/Fe/Cu SODs closer inshore. Our study documents the distribution of picophytoplankton along a North Pacific ecological gradient and offers new concepts and techniques for investigating their biogeography.

Download Full-text

PEMA: a flexible Pipeline for Environmental DNA Metabarcoding Analysis of the 16S/18S ribosomal RNA, ITS, and COI marker genes

GigaScience ◽

10.1093/gigascience/giaa022 ◽

2020 ◽

Vol 9 (3) ◽

Cited By ~ 7

Author(s):

Haris Zafeiropoulos ◽

Ha Quoc Viet ◽

Katerina Vasileiadou ◽

Antonis Potirakis ◽

Christos Arvanitidis ◽

...

Keyword(s):

High Performance Computing ◽

Ribosomal Rna ◽

High Performance ◽

Marker Gene ◽

Environmental Dna ◽

Marker Genes ◽

Sequence Variant ◽

18S Ribosomal Rna ◽

Dna Metabarcoding ◽

Performance Computing

Abstract Background Environmental DNA and metabarcoding allow the identification of a mixture of species and launch a new era in bio- and eco-assessment. Many steps are required to obtain taxonomically assigned matrices from raw data. For most of these, a plethora of tools are available; each tool's execution parameters need to be tailored to reflect each experiment's idiosyncrasy. Adding to this complexity, the computation capacity of high-performance computing systems is frequently required for such analyses. To address the difficulties, bioinformatic pipelines need to combine state-of-the art technologies and algorithms with an easy to get-set-use framework, allowing researchers to tune each study. Software containerization technologies ease the sharing and running of software packages across operating systems; thus, they strongly facilitate pipeline development and usage. Likewise programming languages specialized for big data pipelines incorporate features like roll-back checkpoints and on-demand partial pipeline execution. Findings PEMA is a containerized assembly of key metabarcoding analysis tools that requires low effort in setting up, running, and customizing to researchers’ needs. Based on third-party tools, PEMA performs read pre-processing, (molecular) operational taxonomic unit clustering, amplicon sequence variant inference, and taxonomy assignment for 16S and 18S ribosomal RNA, as well as ITS and COI marker gene data. Owing to its simplified parameterization and checkpoint support, PEMA allows users to explore alternative algorithms for specific steps of the pipeline without the need of a complete re-execution. PEMA was evaluated against both mock communities and previously published datasets and achieved results of comparable quality. Conclusions A high-performance computing–based approach was used to develop PEMA; however, it can be used in personal computers as well. PEMA's time-efficient performance and good results will allow it to be used for accurate environmental DNA metabarcoding analysis, thus enhancing the applicability of next-generation biodiversity assessment studies.

Download Full-text