scholarly journals A reference cytochrome c oxidase subunit I database curated for hierarchical classification of arthropod metabarcoding data

PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5126 ◽  
Author(s):  
Rodney T. Richardson ◽  
Johan Bengtsson-Palme ◽  
Mary M. Gardiner ◽  
Reed M. Johnson

Metabarcoding is a popular application which warrants continued methods optimization. To maximize barcoding inferences, hierarchy-based sequence classification methods are increasingly common. We present methods for the construction and curation of a database designed for hierarchical classification of a 157 bp barcoding region of the arthropod cytochrome c oxidase subunit I (COI) locus. We produced a comprehensive arthropod COI amplicon dataset including annotated arthropod COI sequences and COI sequences extracted from arthropod whole mitochondrion genomes, the latter of which provided the only source of representation for Zoraptera, Callipodida and Holothyrida. The database contains extracted sequences of the target amplicon from all major arthropod clades, including all insect orders, all arthropod classes and Onychophora, Tardigrada and Mollusca outgroups. During curation, we extracted the COI region of interest from approximately 81 percent of the input sequences, corresponding to 73 percent of the genus-level diversity found in the input data. Further, our analysis revealed a high degree of sequence redundancy within the NCBI nucleotide database, with a mean of approximately 11 sequence entries per species in the input data. The curated, low-redundancy database is included in the Metaxa2 sequence classification software (http://microbiology.se/software/metaxa2/). Using this database with the Metaxa2 classifier, we performed a cross-validation analysis to characterize the relationship between the Metaxa2 reliability score, an estimate of classification confidence, and classification error probability. We used this analysis to select a reliability score threshold which minimized error. We then estimated classification sensitivity, false discovery rate and overclassification, the propensity to classify sequences from taxa not represented in the reference database. Our work will help researchers design and evaluate classification databases and conduct metabarcoding on arthropods and alternate taxa.

2018 ◽  
Author(s):  
Rodney Richardson ◽  
Johan Bengtsson-Palme ◽  
Mary M Gardiner ◽  
Reed M Johnson

Metabarcoding is a popular application which warrants continued methods optimization. To maximize barcoding inferences, hierarchy-based sequence classification methods are increasingly common. We present methods for the construction and curation of a database designed for hierarchical classification of a 157 bp barcoding region of the arthropod cytochrome c oxidase subunit I (COI) locus. We produced a comprehensive arthropod COI amplicon dataset including annotated arthropod COI sequences and COI sequences extracted from arthropod whole mitochondrion genomes, which provided the only source of representation for Zoraptera, Callipodida and Holothyrida. The database contains extracted sequences of the target amplicon from all major arthropod clades, including all insect orders, all arthropod classes and Onychophora, Tardigrada and Mollusca outgroups. During curation, we extracted the COI region of interest from approximately 81 percent of the input sequences, corresponding to 73 percent of the genus-level diversity found in the input data. Further, our analysis revealed a high degree of sequence redundancy within the NCBI nucleotide database, with a mean of approximately 11 sequence entries per species in the input data. The curated, low-redundancy database is included in the Metaxa2 sequence classification software ( http://microbiology.se/software/metaxa2/ ). Using this database with the Metaxa2 classifier, we characterized the relationship between the Metaxa2 reliability score, an estimate of classification confidence, and classification error probability. We used this analysis to select a reliability score threshold which minimized error. We then estimated classification sensitivity, false discovery rate and overclassification, the propensity to classify sequences from taxa not represented in the reference database. Our work will help researchers design and evaluate classification databases and conduct metabarcoding on arthropods and alternate taxa.


2018 ◽  
Author(s):  
Rodney Richardson ◽  
Johan Bengtsson-Palme ◽  
Mary M Gardiner ◽  
Reed M Johnson

Metabarcoding is a popular application which warrants continued methods optimization. To maximize barcoding inferences, hierarchy-based sequence classification methods are increasingly common. We present methods for the construction and curation of a database designed for hierarchical classification of a 157 bp barcoding region of the arthropod cytochrome c oxidase subunit I (COI) locus. We produced a comprehensive arthropod COI amplicon dataset including annotated arthropod COI sequences and COI sequences extracted from arthropod whole mitochondrion genomes, which provided the only source of representation for Zoraptera, Callipodida and Holothyrida. The database contains extracted sequences of the target amplicon from all major arthropod clades, including all insect orders, all arthropod classes and Onychophora, Tardigrada and Mollusca outgroups. During curation, we extracted the COI region of interest from approximately 81 percent of the input sequences, corresponding to 73 percent of the genus-level diversity found in the input data. Further, our analysis revealed a high degree of sequence redundancy within the NCBI nucleotide database, with a mean of approximately 11 sequence entries per species in the input data. The curated, low-redundancy database is included in the Metaxa2 sequence classification software ( http://microbiology.se/software/metaxa2/ ). Using this database with the Metaxa2 classifier, we characterized the relationship between the Metaxa2 reliability score, an estimate of classification confidence, and classification error probability. We used this analysis to select a reliability score threshold which minimized error. We then estimated classification sensitivity, false discovery rate and overclassification, the propensity to classify sequences from taxa not represented in the reference database. Our work will help researchers design and evaluate classification databases and conduct metabarcoding on arthropods and alternate taxa.


2014 ◽  
Vol 65 (11) ◽  
pp. 1027 ◽  
Author(s):  
Martin F. Gomon ◽  
Robert D. Ward ◽  
Stephanie Chapple ◽  
Joshua M. Hale

Recent studies have revealed evidence that the identities and distributions of several Indo-West Pacific species of Chlorophthalmus, as redefined by Sato and Nakabo (2002a), are inaccurately understood and reported in the literature. The current confusion is mostly attributable to the meristic conservatism of the genus and the individually variable nature of the morphology in those species. An analysis of the DNA barcode region of cytochrome c oxidase subunit I sequences was employed to independently group specimens into natural species assemblages, providing evidence for verifying or correcting species concepts and identities. A re-examination of the morphology of vouchers in the resultant 12 groupings identified features corroborating the distinctiveness of 10 of the 12 groups at the species level. Each of the other two groups comprised two presumed species on the basis of morphological evidence that do not appear to be separable by cytochrome c oxidase subunit I gene (COI) sequences. Two undescribed species of Chloropthalmus are now known to inhabit slope waters of Australia, and a further two undescribed species were identified elsewhere.


2019 ◽  
Vol 8 (1) ◽  
pp. 67-74
Author(s):  
Ananna Ghosh ◽  
Muhammad Sohel Abedin ◽  
Abdul Jabber Howlader ◽  
Md Monwar Hossain

The Satyrinae is a subfamily of Nymphalid butterfly, which is morphologically and ecologically the most diverse group, occurring in all habitats. In the present study, Cytochrome c oxidase subunit I (COI) gene of seven species of Satyrinae was sequenced, aligned, and used to construct phylogenetic trees. The molecular identification of these Satyrinae species was confirmed by comparing the related sequences in the National Center for Biotechnology Information (NCBI) GenBank. The base compositions of the COI sequences were 39.07% T, 16.44% C, 29.83% A, and 14.64% G, revealing a strong AT bias (68.9%). The sequence distance among Satyrinae species ranged from 0.09% to 0.18%. Phylogenetic trees were constructed by the neighbor-joining (NJ) and maximum likelihood (ML) methods, using Orthetrum sabina as an outgroup. Both trees had almost identical topologies. The sampled species in Satyrinae exhibited the following relationships: Melanitis leda + [(Mycalesis mineus+(Mycalesis gotama+Mycalesis anaxias)) + (Ypthima baldus + (Lethe chandica+Elymnias hypermnestra))], suggesting that M. leda might be distantly related with the rest of the Satyrinae species. This clustering result is almost identical to current traditional classification. This study confirms that the COI based DNA barcoding is an efficient method for the identification of butterflies including Satyrinae species and, as such, may further contribute effectively to biodiversity and evolutionary research. Jahangirnagar University J. Biol. Sci. 8(1): 67-74, 2019 (June)


Zootaxa ◽  
2012 ◽  
Vol 3206 (1) ◽  
pp. 1 ◽  
Author(s):  
EUGENIYA I. BEKKER ◽  
ALEXEY A. KOTOV ◽  
DEREK J. TAYLOR

Frey (1975) subdivided the genus Eurycercus Baird, 1843 (Cladocera: Eurycercidae) into three subgenera: E. (Eurycer-cus) s.str., E. (Bullatifrons) Frey, 1975 and E. (Teretifrons) Frey, 1975. We conducted a revision of the subgenera Eurycer-cus (Eurycercus) and E. (Bullatifrons) in the Holarctic based on the morphology of parthenogenetic females and aphylogeny of cytochrome c oxidase subunit I (COI ) sequences. The following six species are found to be valid: E. lamel-latus (O. F. Müller, 1776); E. macracanthus Frey, 1973; E. pompholygodes Frey, 1975; E. microdontus Frey, 1978; E. lon-girostris Hann, 1982; E. nipponica Tanaka & Fujuta, 2002. The separation of E. vernalis Hann, 1982 from E. longirostrislacks morphological and genetic justification, so E. vernalis is a junior synonym of E. longirostris. A new species, E. ber-ingi sp. nov., was found in several localities in Alaska, U.S.A. Its characters are intermediate between two subgenera sensuFrey (1975): a median keel is expressed, but only in the posterior portion of the carapace dorsum (while it is absent in E.(Bullatifrons) and passes through all the dorsum in Eurycercus s.str.); the dorsal head pores are located on the bubble-likeprojection (a character of the subgenus E (Bullatifrons), but the latter is sitting on a prominent transverse fold (character ofthe subgenus Eurycercus s.str.). The COI tree also does not support separation of the subgenus E. (Bullatifrons) from E. (Eu-rycercus), while separation of E. (Teretifrons) is well-supported. So, we propose to avoid a separation of E. (Bullatifrons) andregard all the species previously placed there as belonging to the subgenus E. (Eurycercus) emend. nov. We also demonstrat-ed that E. macracanthus, E. pompholigodes, E. longirostris and E. nipponica have much broader distributional ranges than previously known.


Sign in / Sign up

Export Citation Format

Share Document