scholarly journals Venturing into auditing of reference libraries: from the hackathon on marine invertebrates to sorting with BAGS

2021 ◽  
Vol 4 ◽  
Author(s):  
Filipe Costa

Reference libraries of DNA sequences are the backbone of DNA-based taxonomic identification systems. The quality and accuracy of the data in reference libraries is critical to achieve reliable identifications. Faulty or inaccurate data may have detrimental impacts in various downstream applications, perpetuating errors over long-term studies and biodiversity data repositories. This risk is particularly prevalent in metabarcoding approaches, where millions of sequences are assigned to taxa in reference libraries through automated and frequently unsupervised procedures. Although quality-compliance measures have been implemented in several stages of the DNA barcode production workflow, no systematized approach has tackled the challenges of revision, curation and annotation of reference libraries. The trend for increasing detection of cryptic diversity further complicates this task. Here we outline the conclusions of the application of two distinct approaches to audit and annotate reference libraries: the hackathon on marine invertebrates hosted by the 8th IBOL conference, and the bioinformatics application “Barcode, Audit & Grade System” (BAGS; Fontes et al. 2021). The former consisted on the assembly of 18 researchers involved in marine barcoding, aiming to audit and annotate a very large number DNA barcode records available in BOLD from major marine invertebrate taxa, including all or selected groups of Annelida, Crustacea, Echinodermata and Mollusca. Discordant Barcode Index Numbers (BINs), that is, BINs including more than one species, were reviewed individually, and the respective records annotated with one of the 4 following tags: MIS-ID (misidentification); AMBIG (ambiguous, unable to resolve); COMPLEX (multiple BINs); SHARE (barcodes shared among species in the same BIN). This effort resulted in the processing of >80.000 barcodes, corresponding to >7.500 species, of which 7% were tagged MIS-ID, 17% AMBIG, 13% COMPLEX and 1% SHARE, with Gastropoda displaying particularly high levels of ambiguity. The sizeable portion of MIS-ID and AMBIG tags raises concern. Yet, part of the AMBIG tags merely reflect underlying uncertainty in species taxonomic status, rather than the deposition of erroneous data in BOLD. Hence, in addition to auditing and annotation, extensive effort should continue to be allocated to the underpinning alpha taxonomy of reference libraries. The second approach here described is BAGS, which consists on an R-based application that provides an user-friendly platform for automated auditing of user-selected metazoan cytochrome oxidase I (COI) reference libraries. BAGS sorts BOLD’s records and species into 5 grades, depending on whether they display BIN concordance (A, B) multiple BINs (C), less than two records (D) or discordant BINs (E). A WoRMS-linked filter allows to select or exclude marine taxa, and a reporting component provides a graphical overview and FASTA files assorted in different combinations of grades. Therefore, BAGS can provide a quick appraisal of the status of an user-defined reference library, allowing simultaneously to recognize the most reliable records, the incidence of cases high intraspecific divergence, gaps in representativeness, and inaccuracies of potential concern. A pilot assessment of BAGS performance in three datasets comprising marine fish, Chironomidae (Insecta) and marine Amphipoda (Crustacea) highlighted the differences in the congruence status of the respective reference libraries. In conclusion, the hackathon had and expressive contribution to the revision and annotation of a very large number of marine invertebrate records lodged in BOLD. Human-mediated revision is highly-reliable and consequential, however, it constituted a massive undertaking that can hardly be repeated without a previous refinement and substantial reduction of the datasets to be revised. This could be achieved resorting to automated revision systems, among which BAGS constitutes a first step. We intend to progress with the expansion and improvement of BAGS, namely by introducing further refinements in the analyses of grade E data, in order to automatically discard simple cases of discordance, thereby reducing the amount of data needing human-mediated revision. Recognition of the need for automated reference library auditing and curation systems is essential to raise confidence of researchers, environmental managers and governmental agencies for the adoption and implementation of DNA-based approaches in aquatic biomonitoring.

Genome ◽  
2020 ◽  
pp. 1-11 ◽  
Author(s):  
Tomasz Rewicz ◽  
Arnold Móra ◽  
Grzegorz Tończyk ◽  
Ada Szymczak ◽  
Michal Grabowski ◽  
...  

We present the results of the first-ever DNA barcoding study of odonates from the Maltese Islands. In total, 10 morphologically identified species were collected during a two-week long expedition in 2018. Eighty cytochrome c oxidase subunit I (COI) barcodes were obtained from the collected specimens. Intra- and interspecific distances ranged from 0.00% to 2.24% and 0.48% to 17.62%, respectively. Successful species identification based on ascribing a single morphological species to a single Barcode Index Number (BIN) was achieved for eight species (80%). In the case of two species, Ischnura genei and Anax parthenope, BINs were shared with other closely related species. The taxonomic status of I. genei is questionable and the phylogenetic relationship between A. imperator/parthenope is not clear. Further studies involving a series of adult specimens collected in a wide spatial range and nuclear markers are necessary to resolve these cases. Therefore, this dataset serves as an initial DNA barcode reference library for Maltese odonates, within a larger project: Aquatic Macroinvertebrates DNA Barcode Library of Malta.


PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e2009 ◽  
Author(s):  
Allan P.M. Santos ◽  
Daniela M. Takiya ◽  
Jorge L. Nessimian

Metrichiais assigned to the Ochrotrichiinae, a group of almost exclusively Neotropical microcaddisflies.Metrichiacomprises over 100 described species and, despite its diversity, only one species has been described from Brazil so far. In this paper, we provide descriptions for 20 new species from 8 Brazilian states:M. acuminatasp. nov.,M. azulsp. nov.,M. bonitasp. nov.,M. bracuisp. nov.,M. caracasp. nov.,M. circuliformesp. nov.,M. curtasp. nov.,M. farofasp. nov.,M. forcepssp. nov.,M. formosinhasp. nov.,M. goianasp. nov.,M. itabaianasp. nov.,M. longissimasp. nov.,M. peludasp. nov.,M. rafaelisp. nov.,M. simplessp. nov.,M. talhadasp. nov.,M. teresp. nov.,M. ubajarasp. nov., andM. vulgarissp. nov.DNA barcode sequences (577 bp of the mitochondrial gene COI) were generated for 13 of the new species and two previously known species ofMetrichiaresulting in 64 sequences. In addition, COI sequences were obtained for other genera of Ochrotrichiinae (Angrisanoia,Nothotrichia,Ochrotrichia,Ragatrichia, andRhyacopsyche). DNA sequences and morphological data were integrated to evaluate species delimitations. K2P pairwise distances were calculated to generate a neighbor-joining tree. COI sequences also were submitted to ABGD and GMYC methods to assess ‘potential species’ delimitation. Analyses showed a conspicuous barcoding gap amongMetrichiasequences (highest intraspecific divergence: 4.8%; lowest interspecific divergence: 12.6%). Molecular analyses also allowed the association of larvae and adults ofMetrichia bonitasp. nov.from Mato Grosso do Sul, representing the first record of microcaddisfly larvae occurring in calcareous tufa (or travertine). ABGD results agreed with the morphological delimitation ofMetrichiaspecies, while GMYC estimated a slightly higher number of species, suggesting the division of two morphological species, each one into two potential species. Because this could be due to unbalanced sampling and the lack of morphological diagnostic characters, we have maintained these two species as undivided.


Author(s):  
Hidayat Ashari ◽  
Dwi Astuti

<p>Javan Plover named <em>Charadrius javanicus</em> is taxonomically under controversy and phylogenetically unresolved yet. Through an analysis of DNA barcode, this study aims (1) to confirm whether Javan Plover is separated species named <em>Charadrius javanicus</em> or a subspecies of <em>C. alexandrinus</em> which named <em>C. a. javanicus</em> and (2) to determine a relationship within this genus. Totally 666 bp DNA sequences of COI barcode gene were analyzed.  The results showed that a sequence divergence between Javan Plover and <em>C. alexandrinus alexandrinus</em> was only 1.2%, while sequence divergences between <em>C.a.alexandrinus</em> and others species, or between Javan Plover and others species were ranged from 9-12%.  Neighbour-joining (NJ) and maximum-parsimony (MP) analyses showed that all individuals of both Javan Plover and Kenith Plover were clustered together, and supported by 99 % and 100 % of bootstrap value in NJ and MP, respectively. This study tends to support the previous findings that Javan Plover was not a separated species named<em> C. javanicus</em>, but it was as a subspecies of <em>C. alexandrinus</em>; named <em>C. a. javanicus</em>. There were two groups of Plover in this study; (<em>C. leschenaultii </em>and <em>C. javanicus </em>+ <em>C.a.alexandrinus</em>), and (<em>C.dubius</em> and <em>C. melodus + C. semipalmatus</em>). DNA barcoding analysis can give certainty taxonomic status of the bird. Then, this study has implication as a basic data that can be used to provide and support the planning of Javan plover conservation programs. </p>


2019 ◽  
Author(s):  
Muhammad Tayyib Naseem ◽  
Muhammad Ashfaq ◽  
Arif Muhammad Khan ◽  
Akhtar Rasool ◽  
Muhammad Asif ◽  
...  

AbstractDNA barcoding is highly effective for identifying specimens once a reference sequence library is available for the species assemblage targeted for analysis. Despite the great need for an improved capacity to identify the insect pests of crops, the use of DNA barcoding is constrained by the lack of a well-parameterized reference library. The current study begins to address this limitation by developing a DNA barcode reference library for the pest aphids of Pakistan. It also examines the affinities of these taxa with conspecific taxa from other geographic regions based on both conventional taxonomy and Barcode Index Numbers (BINs). A total of 809 aphids were collected from 123 plant species at 87 sites across Pakistan. Morphological study and DNA barcoding allowed 774 specimens to be identified to one of 42 species while the others were placed to a genus or subfamily. The 801 sequences obtained from these specimens were assigned to 52 BINs whose monophyly were supported by neighbor-joining (NJ) clustering and Bayesian inference. The 42 species were assigned to 41 BINs with 38 showing BIN concordance; one species (Rhopalosiphum padi) was assigned to two BINs, while two others (Aphis affinis, Aphis gossypii) were assigned to the same BIN, while one species (Aphis astragalina) lacked a qualifying sequence. The 42 Linnaean species were represented on BOLD by 7,870 records from 69 countries. Combining these records with those from Pakistan produced to 60 BINs with 12 species showing a BIN split and three a BIN merger. Geo-distance correlations showed that intraspecific divergence values for 18 of 37 species were not affected by the distance between populations. Forty four of the 52 BINs from Pakistan had counterparts in 73 countries across six continents, documenting the broad distributions of pest aphids.


2020 ◽  
Vol 8 ◽  
Author(s):  
Dagoberto Venera-Pontón ◽  
Amy Driskell ◽  
Sammy De Grave ◽  
Darryl Felder ◽  
Justin Scioli ◽  
...  

DNA barcoding is a useful tool to identify the components of mixed or bulk samples, as well as to determine individuals that lack morphologically diagnostic features. However, the reference database of DNA barcode sequences is particularly sparsely populated for marine invertebrates and for tropical taxa. We used samples collected as part of two field courses, focused on graduate training in taxonomy and systematics, to generate DNA sequences of the barcode fragments of cytochrome c oxidase subunit I (COI) and mitochondrial ribosomal 16S genes for 447 individuals, representing at least 129 morphospecies of decapod crustaceans. COI sequences for 36% (51/140) of the species and 16S sequences for 26% (37/140) of the species were new to GenBank. Automatic Barcode Gap Discovery identified 140 operational taxonomic units (OTUs) which largely coincided with the morphospecies delimitations. Barcode identifications (i.e. matches to identified sequences) were especially useful for OTUs within Synalpheus, a group that is notoriously difficult to identify and rife with cryptic species, a number of which we could not identify to species, based on morphology. Non-concordance between morphospecies and barcode OTUs also occurred in a few cases of suspected cryptic species. As mitochondrial pseudogenes are particularly common in decapods, we investigate the potential for this dataset to include pseudogenes and discuss the utility of these sequences as species identifiers (i.e. barcodes). These results demonstrate that material collected and identified during training activities can provide useful incidental barcode reference samples for under-studied taxa.


Biology ◽  
2021 ◽  
Vol 10 (2) ◽  
pp. 161
Author(s):  
Irene Deidda ◽  
Roberta Russo ◽  
Rosa Bonaventura ◽  
Caterina Costa ◽  
Francesca Zito ◽  
...  

Invertebrates represent about 95% of existing species, and most of them belong to aquatic ecosystems. Marine invertebrates are found at intermediate levels of the food chain and, therefore, they play a central role in the biodiversity of ecosystems. Furthermore, these organisms have a short life cycle, easy laboratory manipulation, and high sensitivity to marine pollution and, therefore, they are considered to be optimal bioindicators for assessing detrimental chemical agents that are related to the marine environment and with potential toxicity to human health, including neurotoxicity. In general, albeit simple, the nervous system of marine invertebrates is composed of neuronal and glial cells, and it exhibits biochemical and functional similarities with the vertebrate nervous system, including humans. In recent decades, new genetic and transcriptomic technologies have made the identification of many neural genes and transcription factors homologous to those in humans possible. Neuroinflammation, oxidative stress, and altered levels of neurotransmitters are some of the aspects of neurotoxic effects that can also occur in marine invertebrate organisms. The purpose of this review is to provide an overview of major marine pollutants, such as heavy metals, pesticides, and micro and nano-plastics, with a focus on their neurotoxic effects in marine invertebrate organisms. This review could be a stimulus to bio-research towards the use of invertebrate model systems other than traditional, ethically questionable, time-consuming, and highly expensive mammalian models.


2021 ◽  
Vol 168 (6) ◽  
Author(s):  
Ann Bucklin ◽  
Katja T. C. A. Peijnenburg ◽  
Ksenia N. Kosobokova ◽  
Todd D. O’Brien ◽  
Leocadio Blanco-Bercial ◽  
...  

AbstractCharacterization of species diversity of zooplankton is key to understanding, assessing, and predicting the function and future of pelagic ecosystems throughout the global ocean. The marine zooplankton assemblage, including only metazoans, is highly diverse and taxonomically complex, with an estimated ~28,000 species of 41 major taxonomic groups. This review provides a comprehensive summary of DNA sequences for the barcode region of mitochondrial cytochrome oxidase I (COI) for identified specimens. The foundation of this summary is the MetaZooGene Barcode Atlas and Database (MZGdb), a new open-access data and metadata portal that is linked to NCBI GenBank and BOLD data repositories. The MZGdb provides enhanced quality control and tools for assembling COI reference sequence databases that are specific to selected taxonomic groups and/or ocean regions, with associated metadata (e.g., collection georeferencing, verification of species identification, molecular protocols), and tools for statistical analysis, mapping, and visualization. To date, over 150,000 COI sequences for ~ 5600 described species of marine metazoan plankton (including holo- and meroplankton) are available via the MZGdb portal. This review uses the MZGdb as a resource for summaries of COI barcode data and metadata for important taxonomic groups of marine zooplankton and selected regions, including the North Atlantic, Arctic, North Pacific, and Southern Oceans. The MZGdb is designed to provide a foundation for analysis of species diversity of marine zooplankton based on DNA barcoding and metabarcoding for assessment of marine ecosystems and rapid detection of the impacts of climate change.


1992 ◽  
Vol 49 (5) ◽  
pp. 1010-1017 ◽  
Author(s):  
Nicolas S. Bloom

Total mercury, monomethylmercury (CH3Hg), and dimethylmercury ((CH3)2Hg) in edible muscle were examined in 229 samples, representing seven freshwater and eight saltwater fish species and several species of marine invertebrates using ultraclean techniques. Total mercury was determined by hot HNO3/H2SO4/BrClldigestion, SnCl2 reduction, purging onto gold, and analysis by cold vapor atomic fluorescence spectrometry (CVAFS). Methylmercury was determined by KOH/methanol digestion using aqueous phase ethylation, cryogenic gas chromatography, and CVAFS detection. Total mercury and CH3Hg concentrations varied from 0.011 to 2.78 μg∙g−1 (wet weight basis, as Hg) for all samples, while no sample contained detectable (CH3)2Hg (<0.001 μg∙g−1 as Hg). The observed proportion of total mercury (as CH3Hg) ranged from 69 to 132%, with a relative standard deviation for quintuplicate analysis of about 10%; nearly all of this variability can be explained by the analytical variability of total mercury and CH3Hg. Poorly homogenized samples showed greater variability, primarily because total mercury and CH3Hg were measured on separate aliquots, which vary in mercury concentration, not speciation. I conclude that for all species studied, virtually ail (>95%) of the mercury present is as CH3Hg and that past reports of substantially lower CH3Hg fractions may have been biased by analytical and homogeneity variability.


The Holocene ◽  
2018 ◽  
Vol 28 (12) ◽  
pp. 1894-1908
Author(s):  
Andréanne Bourgeois-Roy ◽  
Hugo Crites ◽  
Pascal Bernatchez ◽  
Denis Lacelle ◽  
André Martel

The late Pleistocene–early Holocene transition period was characterized by rapid environmental change. Here, we investigate the impact of these changes on the marine invertebrates living in a shallow inlet of the post-glacial Goldthwait Sea. The site is located near Baie-Comeau (QC, Canada), where a number of remarkably well-preserved shell deposits are found along the Rivière aux Anglais Valley on the north shore of the St. Lawrence maritime estuary. Seven phyla of marine invertebrates with a minimum of 25 species or taxa were inventoried in a shell deposit, dominated by a community of Hiatella arctica with Mytilus edulis and barnacles composing the subcommunity. The majority of taxa identified in the shell deposit are boreal and sub-Arctic species; however, temperate species that exist today in the St. Lawrence maritime estuary have not been found. Based on marine invertebrate diversity and δ18O(CaCO3) of Mytilus edulis, the water in the shallow inlet of the Goldthwait Sea must have been cold and saline. The range of AMS 14C ages from 15 Mytilus edulis, constrained to 10,900 and 10,690 cal. yr BP, and exceptional state of preservation of adult and juvenile molluscan specimens suggest the abrupt mortality of entire invertebrate communities due to changing hydrodynamic conditions that included the combined effect of freshwater discharge from the receding Laurentide Ice Sheet and rapid isostatic uplift.


Sign in / Sign up

Export Citation Format

Share Document