A SMRT approach for targeted amplicon sequencing of museum specimens (Lepidoptera)—patterns of nucleotide misincorporation

PeerJ ◽

10.7717/peerj.10420 ◽

2021 ◽

Vol 9 ◽

pp. e10420

Author(s):

Jacopo D’Ercole ◽

Sean W.J. Prosser ◽

Paul D.N. Hebert

Keyword(s):

Single Molecule ◽

Large Scale ◽

Dna Degradation ◽

Degraded Dna ◽

Reference Sequence ◽

Coi Gene ◽

Sequence Information ◽

Museum Specimens ◽

Contig Assembly ◽

Evolutionary Diversification

Natural history collections are a valuable resource for molecular taxonomic studies and for examining patterns of evolutionary diversification, particularly in the case of rare or extinct species. However, the recovery of sequence information is often complicated by DNA degradation. This article describes use of the Sequel platform (Pacific Biosciences) to recover the 658 bp barcode region of the mitochondrial cytochrome c oxidase I (COI) gene from 380 butterflies with an average age of 50 years. Nested multiplex PCR was employed for library preparation to facilitate sequence recovery from extracts with low concentrations of highly degraded DNA. By employing circular consensus sequencing (CCS) of short amplicons (circa 150 bp), full-length barcodes could be assembled without a reference sequence, an important advance from earlier protocols which required reference sequences to guide contig assembly. The Sequel protocol recovered COI sequences (499 bp on average) from 318 of 380 specimens (84%), much higher than for Sanger sequencing (26%). Because each read derives from a single molecule, it was also possible to quantify the incidence of substitutions arising from DNA damage. In agreement with past work on sequence changes induced by DNA degradation, the transition C/G → T/A was the most prevalent category of change, but its rate of occurrence (4.58E−4) was so low that it did not impede the recovery of reliable sequences. Because the current protocol recovers COI sequence from most museum specimens, and because sequence fidelity is unaffected by nucleotide misincorporations, large-scale sequence characterization of museum specimens is feasible.

Download Full-text

Systematics of the Arboreal Neotropical ‘thorellii’ Clade of Centruroides Bark Scorpions (Buthidae) and the Efficacy of Mini-Barcodes for Museum Specimens

Diversity ◽

10.3390/d13090441 ◽

2021 ◽

Vol 13 (9) ◽

pp. 441

Author(s):

Aaron M. Goodman ◽

Lorenzo Prendini ◽

Lauren A. Esposito

Keyword(s):

Molecular Phylogenetics ◽

Taxonomic Revision ◽

Morphological Characters ◽

Species Level ◽

Degraded Dna ◽

Coi Gene ◽

Museum Specimens ◽

Base Pairs ◽

Representative Species ◽

Level Identification

Fragmented and degraded DNA is pervasive among museum specimens, hindering molecular phylogenetics and species identification. Mini-barcodes, 200–300-base-pair (bp) fragments of barcoding genes, have proven effective for species-level identification of specimens from which complete barcodes cannot be obtained in many groups, but have yet to be tested in arachnids. The present study investigated the efficacy of mini-barcodes combined with longer sequences of the Cytochrome c Oxidase Subunit I (COI) gene in the systematics of the arboreal Neotropical ‘thorellii’ clade of Centruroides Marx, 1890 bark scorpions (Buthidae, C.L. Koch 1837), the species of which have proven to be difficult to identify and delimit due to their similar morphology. The phylogeny of 53 terminals, representing all nine species of the clade and representative species belonging to related clades of Centruroides, rooted on Heteroctenus junceus (Herbst, 1800) and based on up to 1078 base pairs of COI and 112 morphological characters, is presented to test the monophyly of the clade and the limits of its component species. The results support the recognition of nine species of the ‘thorellii’ clade, in accordance with a recent taxonomic revision, and highlight the efficacy of mini-barcodes for identifying morphologically similar cryptic species using specimens of variable age and preservation.

Download Full-text

AFLP-derived, Codominant Markers for Locus-specific Applications

HortScience ◽

10.21273/hortsci.33.3.514e ◽

1998 ◽

Vol 33 (3) ◽

pp. 514e-514

Author(s):

James M. Bradeen ◽

Philipp W. Simon

Keyword(s):

Linkage Mapping ◽

Large Scale ◽

Pcr Primers ◽

Inverse Pcr ◽

Sequence Information ◽

Pcr Assay ◽

Specific Primers ◽

Simultaneous Evaluation ◽

Feral Populations ◽

Diversity Assessment

The amplified fragment length polymorphism (AFLP) is a powerful marker, allowing rapid and simultaneous evaluation of multiple potentially polymorphic sites. Although well-adapted to linkage mapping and diversity assessment, AFLPs are primarily dominant in nature. Dominance, relatively high cost, and technological difficulty limit use of AFLPs for marker-aided selection and other locus-specific applications. In carrot the Y2 locus conditions carotene accumulation in the root xylem. We identified AFLP fragments linked to the dominant Y2 allele and pursued conversion of those fragments to codominant, PCR-based forms useful for locus-specific applications. The short length of AFLPs (≈60 to 500 bp) precludes development of longer, more specific primers as in SCAR development. Instead, using sequence information from cloned AFLP fragments for primer design, regions outside of the original fragment were amplified by inverse PCR or ligation-mediated PCR, cloned, and sequenced. Differences in sequences associated with Y2 vs. y2 allowed development of simple PCR assays differentiating those alleles. PCR primers flanking an insertion associated with the recessive allele amplified differently sized products for the two Y2 alleles in one assay. This assay is rapid, technologically simple (requiring no radioactivity and little advanced training or equipment), reliable, inexpensive, and codominant. Our PCR assay has a variety of large scale, locus-specific applications including genotyping diverse carrot cultivars and wild and feral populations. Efforts are underway to improve upon conversion technology and to more extensively test the techniques we have developed.

Download Full-text

Cloning, expression, and analysis of the group 2 allergen from Dermatophagoides farinae from China

Anais da Academia Brasileira de Ciências ◽

10.1590/s0001-37652010000400017 ◽

2010 ◽

Vol 82 (4) ◽

pp. 941-951 ◽

Cited By ~ 2

Author(s):

Cui Yu-bao ◽

Ying Zhou ◽

Shi Weihong ◽

Ma Guifang ◽

Li Yang ◽

...

Keyword(s):

Large Scale ◽

Alpha Helix ◽

Random Coil ◽

Reference Sequence ◽

Scale Production ◽

Dermatophagoides Farinae ◽

E Coli ◽

Large Scale Production ◽

Solid Foundation ◽

Group 2

To obtain the recombinant group 2 allergen product of Dermatophagoides farinae (Der f 2), the Der f 2 gene was synthesized by RT-PCR. The full-length cDNA comprised 441 nucleotides and was 99.3% identical to the reference sequence (GenBank AB195580). The cDNA was bound to vector pET28a to construct plasmid pET28a(+)-Der f 2, which was transformed into E. coli BL21 and induced by IPTG. SDS-PAGE showed a specific band of about 14kDa in the hole cell lysate. s estiated by chroatography, about 3.86 g of the recobinant product as obtained, which conjugated with serum IgE from asthmatic children. The protein had a signal peptide of 17 amino acids. Its secondary structure comprised an alpha helix (19.86%), an extended strand (30.82%), and a random coil (49.32%). The subcellular localization of this allergen was predicted to be at mitochondria. Furthermore, its function was shown to be associated with an MD-2-related lipid-recognition (ML) domain. The results of this study provide a solid foundation for large-scale production of the allergen for clinical diagnosis and treatent of allergic disorders.

Download Full-text

Single-Molecule Long-Read Sequencing of Purslane (Portulaca oleracea) and Differential Gene Expression Related with Biosynthesis of Unsaturated Fatty Acids

Plants ◽

10.3390/plants10040655 ◽

2021 ◽

Vol 10 (4) ◽

pp. 655

Author(s):

Hongmei Du ◽

Shah Zaman ◽

Shuiqingqing Hu ◽

Shengquan Che

Keyword(s):

Fatty Acids ◽

Differential Expression ◽

Single Molecule ◽

Unsaturated Fatty Acids ◽

Average Length ◽

Expression Profiles ◽

Portulaca Oleracea ◽

Pcr Analysis ◽

Sequence Information ◽

Leaves And Roots

This study aimed to obtain the full-length transcriptome of purslane (Portulaca oleracea); assorted plant samples were used for single-molecule real-time (SMRT) sequencing. Based on SMRT, functional annotation of transcripts, transcript factors (TFs) analysis, simple sequence repeat analysis and long non-coding RNAs (LncRNAs) prediction were accomplished. Total 15.33-GB reads were produced; with 9,350,222 subreads and the average length of subreads, 1640 bp was counted. With 99.99% accuracy, after clustering, 132,536 transcripts and 78,559 genes were detected. All unique SMART transcripts were annotated in seven functional databases. 4180 TFs (including transcript regulators) and 7289 LncRNAs were predicted. The results of RNA-seq were confirmed with qRT–PCR analysis. Illumina sequencing of leaves and roots of two purslane genotypes was carried out. Amounts of differential expression genes and related KEGG pathways were found. The expression profiles of related genes in the biosynthesis of unsaturated fatty acids pathway in leaves and roots of two genotypes of purslane were analyzed. Differential expression of genes in this pathway built the foundation of ω-3 fatty acid accumulation in different organs and genotypes of purslane. The aforementioned results provide sequence information and may be a valuable resource for whole-genome sequencing of purslane in the future.

Download Full-text

Detection of structural variations in densely-labelled optical DNA barcodes: A hidden Markov model approach

PLoS ONE ◽

10.1371/journal.pone.0259670 ◽

2021 ◽

Vol 16 (11) ◽

pp. e0259670

Author(s):

Albertas Dvirnas ◽

Callum Stewart ◽

Vilhelm Müller ◽

Santosh Kumar Bikkarolla ◽

Karolin Frykholm ◽

...

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Large Scale ◽

Hidden Markov ◽

Sequence Information ◽

True Positive ◽

Dna Barcodes ◽

Structural Variations ◽

Genomic Alterations ◽

Data Set

Large-scale genomic alterations play an important role in disease, gene expression, and chromosome evolution. Optical DNA mapping (ODM), commonly categorized into sparsely-labelled ODM and densely-labelled ODM, provides sequence-specific continuous intensity profiles (DNA barcodes) along single DNA molecules and is a technique well-suited for detecting such alterations. For sparsely-labelled barcodes, the possibility to detect large genomic alterations has been investigated extensively, while densely-labelled barcodes have not received as much attention. In this work, we introduce HMMSV, a hidden Markov model (HMM) based algorithm for detecting structural variations (SVs) directly in densely-labelled barcodes without access to sequence information. We evaluate our approach using simulated data-sets with 5 different types of SVs, and combinations thereof, and demonstrate that the method reaches a true positive rate greater than 80% for randomly generated barcodes with single variations of size 25 kilobases (kb). Increasing the length of the SV further leads to larger true positive rates. For a real data-set with experimental barcodes on bacterial plasmids, we successfully detect matching barcode pairs and SVs without any particular assumption of the types of SVs present. Instead, our method effectively goes through all possible combinations of SVs. Since ODM works on length scales typically not reachable with other techniques, our methodology is a promising tool for identifying arbitrary combinations of genomic alterations.

Download Full-text

3D DNA structural barcode copying and random access

10.1101/2020.11.27.401596 ◽

2020 ◽

Author(s):

Filip Bošković ◽

Alexander Ohmann ◽

Ulrich F. Keyser ◽

Kaikai Chen

Keyword(s):

Data Storage ◽

Single Molecule ◽

Self Assembly ◽

Large Scale ◽

Three Dimensional ◽

Random Access ◽

Scale Production ◽

Digital Information ◽

Dna Nanostructures ◽

Large Scale Production

AbstractThree-dimensional (3D) DNA nanostructures built via DNA self-assembly have established recent applications in multiplexed biosensing and storing digital information. However, a key challenge is that 3D DNA structures are not easily copied which is of vital importance for their large-scale production and for access to desired molecules by target-specific amplification. Here, we build 3D DNA structural barcodes and demonstrate the copying and random access of the barcodes from a library of molecules using a modified polymerase chain reaction (PCR). The 3D barcodes were assembled by annealing a single-stranded DNA scaffold with complementary short oligonucleotides containing 3D protrusions at defined locations. DNA nicks in these structures are ligated to facilitate barcode copying using PCR. To randomly access a target from a library of barcodes, we employ a non-complementary end in the DNA construct that serves as a barcode-specific primer template. Readout of the 3D DNA structural barcodes was performed with nanopore measurements. Our study provides a roadmap for convenient production of large quantities of self-assembled 3D DNA nanostructures. In addition, this strategy offers access to specific targets, a crucial capability for multiplexed single-molecule sensing and for DNA data storage.

Download Full-text

Introducing ribosomal tandem repeat barcoding for fungi

10.1101/310540 ◽

2018 ◽

Cited By ~ 2

Author(s):

Christian Wurzbacher ◽

Ellen Larsson ◽

Johan Bengtsson-Palme ◽

Silke Van den Wyngaert ◽

Sten Svantesson ◽

...

Keyword(s):

Large Scale ◽

Tandem Repeats ◽

Reference Data ◽

Reference Sequence ◽

Herbarium Specimens ◽

Nanopore Sequencing ◽

Desktop Computer ◽

Third Generation Sequencing ◽

Ribosomal Operon ◽

Sequencing Facility

AbstractSequence analysis of the various ribosomal genetic markers is the dominant molecular method for identification and description of fungi. However, there is little agreement on what ribosomal markers should be used, and research groups utilize different markers depending on what fungal groups are targeted. New environmental fungal lineages known only from DNA data reveal significant gaps in the coverage of the fungal kingdom both in terms of taxonomy and marker coverage in the reference sequence databases. In order to integrate references covering all of the ribosomal markers, we present three sets of general primers that allow the amplification of the complete ribosomal operon from the ribosomal tandem repeats. The primers cover all ribosomal markers (ETS, SSU, ITS1, 5.8S, ITS2, LSU, and IGS) from the 5’ end of the ribosomal operon all the way to the 3’ end. We coupled these primers successfully with third generation sequencing (PacBio and Nanopore sequencing) to showcase our approach on authentic fungal herbarium specimens. In particular, we were able to generate high-quality reference data with Nanopore sequencing in a high-throughput manner, showing that the generation of reference data can be achieved on a regular desktop computer without the need for a large-scale sequencing facility. The quality of the Nanopore generated sequences was 99.85 %, which is comparable with the 99.78 % accuracy described for Sanger sequencing. With this work, we hope to stimulate the generation of a new comprehensive standard of ribosomal reference data with the ultimate aim to close the huge gaps in our reference datasets.

Download Full-text

Large Scale Identification of the Missing

Silent Witness ◽

10.1093/oso/9780190909444.003.0010 ◽

2020 ◽

pp. 193-207

Author(s):

Andreas Kleiser ◽

Thomas J. Parsons

Keyword(s):

Human Rights ◽

Forensic Science ◽

Rule Of Law ◽

Large Scale ◽

Dna Analysis ◽

Degraded Dna ◽

Innovation Research ◽

Missing Persons ◽

The Individual ◽

Historical Developments

This chapter describes the experiences of the International Commission on Missing Persons (ICMP) in conducting large-scale, DNA-based identification of the missing, discussed within the context of policy and historical developments underpinning today’s requirement for effective investigations when persons go missing. These developments include a shift to the rule of law and human rights reference framework as part of advancing state responsibility on the issue of the missing. The chapter takes note of historical as well as rule-of-law initiatives at the international level in the wake of the wars in the former Yugoslavia in the 1990s, including the creation of ICMP. Generally accepted as capable of establishing facts, forensic science, including DNA analysis, has become a centerpiece of effective investigations in the pursuit of justice at the individual and societal levels. However, access to forensic science is not universal, largely due to cost. The chapter discusses two limitations on using DNA to identify missing persons that drive cost: the use of kinship analysis and degraded DNA. DNA analysis through next generation sequencing (NGS), or massive parallel sequencing (MPS), will likely redress both shortcomings. In ICMP’s experience, innovation, research, and dedication can contribute to a more effective approach to accounting for missing persons; this in turn will contribute more broadly to the pursuit of justice and the advancement of human rights globally.

Download Full-text

Applications of Next-Generation Sequencing for Large-Scale Pathogen Diagnoses in Soybean

Plant Disease ◽

10.1094/pdis-05-18-0905-re ◽

2019 ◽

Vol 103 (6) ◽

pp. 1075-1083 ◽

Cited By ~ 2

Author(s):

Gustavo A. Díaz-Cruz ◽

Charlotte M. Smith ◽

Kiana F. Wiebe ◽

Sachi M. Villanueva ◽

Adam R. Klonowski ◽

...

Keyword(s):

Next Generation Sequencing ◽

Pseudomonas Syringae ◽

Large Scale ◽

Fold Increase ◽

Yellow Mosaic Virus ◽

Growth Stages ◽

Sequence Information ◽

Next Generation ◽

Leaf Sample ◽

Generation Sequencing

Soybean (Glycine max) has become an important crop in Manitoba, Canada, with a 10-fold increase in dedicated acreage over the past decade. Given the rapid increase in production, scarce information about foliar diseases present in the province has been recorded. In order to describe the foliar pathogens affecting this legume, we harnessed next-generation sequencing (NGS) to carry out a comprehensive survey across Manitoba in 2016. Fields were sampled during the V2/3 (33 fields) and R6 (70 fields) growth stages, with at least three symptomatic leaves per field collected and subjected to RNA sequencing. We successfully detected several bacteria, fungi, and viruses known to infect soybean, including Pseudomonas savastanoi pv. glycinea, Septoria glycines, and Peronospora manshurica, as well as pathogens not previously identified in the province (e.g., Pseudomonas syringae pv. tabaci, Cercospora sojina, and Bean yellow mosaic virus). For some microorganisms, we were able to disentangle the different pathovars present and/or assemble their genome sequence. Since NGS generates data on the entire flora and fauna occupying a leaf sample, we also identified residual pathogens (i.e., pathogens of crops other than soybean) and multiple species of arthropod pests. Finally, the sequence information produced by NGS allowed for the development of polymerase chain reaction-based diagnostics for some of the most widespread and important pathogens. Although there are many benefits of using NGS for large-scale plant pathogen diagnoses, we also discuss some of the limitations of this technology.

Download Full-text

The sequencing and de novo assembly of the Larimichthys crocea genome using PacBio and Hi-C technologies

Scientific Data ◽

10.1038/s41597-019-0194-3 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 8

Author(s):

Baohua Chen ◽

Zhixiong Zhou ◽

Qiaozhen Ke ◽

Yidi Wu ◽

Huaqiang Bai ◽

...

Keyword(s):

Marine Fish ◽

Single Molecule ◽

Large Scale ◽

Reference Genome ◽

De Novo ◽

Larimichthys Crocea ◽

Chromosome Conformation ◽

Protein Coding ◽

Total Length ◽

Chromosome Level

Abstract Larimichthys crocea is an endemic marine fish in East Asia that belongs to Sciaenidae in Perciformes. L. crocea has now been recognized as an “iconic” marine fish species in China because not only is it a popular food fish in China, it is a representative victim of overfishing and still provides high value fish products supported by the modern large-scale mariculture industry. Here, we report a chromosome-level reference genome of L. crocea generated by employing the PacBio single molecule sequencing technique (SMRT) and high-throughput chromosome conformation capture (Hi-C) technologies. The genome sequences were assembled into 1,591 contigs with a total length of 723.86 Mb and a contig N50 length of 2.83 Mb. After chromosome-level scaffolding, 24 scaffolds were constructed with a total length of 668.67 Mb (92.48% of the total length). Genome annotation identified 23,657 protein-coding genes and 7262 ncRNAs. This highly accurate, chromosome-level reference genome of L. crocea provides an essential genome resource to support the development of genome-scale selective breeding and restocking strategies of L. crocea.

Download Full-text