scholarly journals Field-based species identification in eukaryotes using real-time nanopore sequencing

2017 ◽  
Author(s):  
Joe Parker ◽  
Andrew J. Helmstetter ◽  
Dion Devey ◽  
Alexander S.T. Papadopulos

Advances in DNA sequencing and informatics have revolutionised biology over the past four decades, but technological limitations have left many applications unexplored1,2. Recently, portable, real-time, nanopore sequencing (RTnS) has become available. This offers opportunities to rapidly collect and analyse genomic data anywhere3–5. However, the generation of datasets from large, complex genomes has been constrained to laboratories6,7. The portability and long DNA sequences of RTnS offer great potential for field-based species identification, but the feasibility and accuracy of these technologies for this purpose have not been assessed. Here, we show that a field-based RTnS analysis of closely-related plant species (Arabidopsis spp.)8 has many advantages over laboratory-based high-throughput sequencing (HTS) methods for species level identification-by-sequencing and de novo phylogenomics. Samples were collected and sequenced in a single day by RTnS using a portable, “al fresco” laboratory. Our analyses demonstrate that correctly identifying unknown reads from matches to a reference database with RTnS reads enables rapid and confident species identification. Individually annotated RTnS reads can be used to infer the evolutionary relationships of A. thaliana. Furthermore, hybrid genome assembly with RTnS and HTS reads substantially improved upon a genome assembled from HTS reads alone. Field-based RTnS makes real-time, rapid specimen identification and genome wide analyses possible. These technological advances are set to revolutionise research in the biological sciences9 and have broad implications for conservation, taxonomy, border agencies and citizen science.

2019 ◽  
Vol 70 (15) ◽  
pp. 3867-3879 ◽  
Author(s):  
Anneke Frerichs ◽  
Julia Engelhorn ◽  
Janine Altmüller ◽  
Jose Gutierrez-Marcos ◽  
Wolfgang Werr

Abstract Fluorescence-activated cell sorting (FACS) and assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) were combined to analyse the chromatin state of lateral organ founder cells (LOFCs) in the peripheral zone of the Arabidopsis apetala1-1 cauliflower-1 double mutant inflorescence meristem. On a genome-wide level, we observed a striking correlation between transposase hypersensitive sites (THSs) detected by ATAC-seq and DNase I hypersensitive sites (DHSs). The mostly expanded DHSs were often substructured into several individual THSs, which correlated with phylogenetically conserved DNA sequences or enhancer elements. Comparing chromatin accessibility with available RNA-seq data, THS change configuration was reflected by gene activation or repression and chromatin regions acquired or lost transposase accessibility in direct correlation with gene expression levels in LOFCs. This was most pronounced immediately upstream of the transcription start, where genome-wide THSs were abundant in a complementary pattern to established H3K4me3 activation or H3K27me3 repression marks. At this resolution, the combined application of FACS/ATAC-seq is widely applicable to detect chromatin changes during cell-type specification and facilitates the detection of regulatory elements in plant promoters.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
F. A. Bastiaan von Meijenfeldt ◽  
Ksenia Arkhipova ◽  
Diego D. Cambuy ◽  
Felipe H. Coutinho ◽  
Bas E. Dutilh

Abstract Current-day metagenomics analyses increasingly involve de novo taxonomic classification of long DNA sequences and metagenome-assembled genomes. Here, we show that the conventional best-hit approach often leads to classifications that are too specific, especially when the sequences represent novel deep lineages. We present a classification method that integrates multiple signals to classify sequences (Contig Annotation Tool, CAT) and metagenome-assembled genomes (Bin Annotation Tool, BAT). Classifications are automatically made at low taxonomic ranks if closely related organisms are present in the reference database and at higher ranks otherwise. The result is a high classification precision even for sequences from considerably unknown organisms.


2020 ◽  
Vol 10 (10) ◽  
pp. 3533-3540 ◽  
Author(s):  
Kim B. Ferguson ◽  
Tore Kursch-Metz ◽  
Eveline C. Verhulst ◽  
Bart A. Pannebakker

Trichogramma brassicae (Bezdenko) are egg parasitoids that are used throughout the world as biological control agents and in laboratories as model species. Despite this ubiquity, few genetic resources exist beyond COI, ITS2, and RAPD markers. Aided by a Wolbachia infection, a wild-caught strain from Germany was reared for low heterozygosity and sequenced in a hybrid de novo strategy, after which several assembling strategies were evaluated. The best assembly, derived from a DBG2OLC-based pipeline, yielded a genome of 235 Mbp made up of 1,572 contigs with an N50 of 556,663 bp. Following a rigorous ab initio-, homology-, and evidence-based annotation, 16,905 genes were annotated and functionally described. As an example of the utility of the genome, a simple ortholog cluster analysis was performed with sister species T. pretiosum, revealing over 6000 shared clusters and under 400 clusters unique to each species. The genome and transcriptome presented here provides an essential resource for comparative genomics of the commercially relevant genus Trichogramma, but also for research into molecular evolution, ecology, and breeding of T. brassicae.


2020 ◽  
Vol 10 (5) ◽  
pp. 1495-1501 ◽  
Author(s):  
Tsuyoshi Tanaka ◽  
Ryo Nishijima ◽  
Shota Teramoto ◽  
Yuka Kitomi ◽  
Takeshi Hayashi ◽  
...  

IR64 is a rice variety with high-yield that has been widely cultivated around the world. IR64 has been replaced by modern varieties in most growing areas. Given that modern varieties are mostly progenies or relatives of IR64, genetic analysis of IR64 is valuable for rice functional genomics. However, chromosome-level genome sequences of IR64 have not been available previously. Here, we sequenced the IR64 genome using synthetic long reads obtained by linked-read sequencing and ultra-long reads obtained by nanopore sequencing. We integrated these data and generated the de novo assembly of the IR64 genome of 367 Mb, equivalent to 99% of the estimated size. Continuity of the IR64 genome assembly was improved compared with that of a publicly available IR64 genome assembly generated by short reads only. We annotated 41,458 protein-coding genes, including 657 IR64-specific genes, that are missing in other high-quality rice genome assemblies IRGSP-1.0 of japonica cultivar Nipponbare or R498 of indica cultivar Shuhui498. The IR64 genome assembly will serve as a genome resource for rice functional genomics as well as genomics-driven and/or molecular breeding.


Diversity ◽  
2019 ◽  
Vol 11 (9) ◽  
pp. 144 ◽  
Author(s):  
Laís Coelho ◽  
Lukas Musher ◽  
Joel Cracraft

Current generation high-throughput sequencing technology has facilitated the generation of more genomic-scale data than ever before, thus greatly improving our understanding of avian biology across a range of disciplines. Recent developments in linked-read sequencing (Chromium 10×) and reference-based whole-genome assembly offer an exciting prospect of more accessible chromosome-level genome sequencing in the near future. We sequenced and assembled a genome of the Hairy-crested Antbird (Rhegmatorhina melanosticta), which represents the first publicly available genome for any antbird (Thamnophilidae). Our objectives were to (1) assemble scaffolds to chromosome level based on multiple reference genomes, and report on differences relative to other genomes, (2) assess genome completeness and compare content to other related genomes, and (3) assess the suitability of linked-read sequencing technology for future studies in comparative phylogenomics and population genomics studies. Our R. melanosticta assembly was both highly contiguous (de novo scaffold N50 = 3.3 Mb, reference based N50 = 53.3 Mb) and relatively complete (contained close to 90% of evolutionarily conserved single-copy avian genes and known tetrapod ultraconserved elements). The high contiguity and completeness of this assembly enabled the genome to be successfully mapped to the chromosome level, which uncovered a consistent structural difference between R. melanosticta and other avian genomes. Our results are consistent with the observation that avian genomes are structurally conserved. Additionally, our results demonstrate the utility of linked-read sequencing for non-model genomics. Finally, we demonstrate the value of our R. melanosticta genome for future researchers by mapping reduced representation sequencing data, and by accurately reconstructing the phylogenetic relationships among a sample of thamnophilid species.


2020 ◽  
Author(s):  
K. B. Ferguson ◽  
T. Kursch-Metz ◽  
E. C. Verhulst ◽  
B. A. Pannebakker

ABSTRACTTrichogramma brassicae (Bezdenko) are egg parasitoids that are used throughout the world as biological control agents and in laboratories as model species. Despite this ubiquity, few genetic resources exist beyond COI, ITS2, and RAPD markers. Aided by a Wolbachia infection, a wild-caught strain from Germany was reared for low heterozygosity and sequenced in a hybrid de novo strategy, after which several assembling strategies were evaluated. The best assembly, derived from a DBG2OLC-based pipeline, yielded a genome of 235 Mbp made up of 1,572 contigs with an N50 of 556,663 bp. Following a rigorous ab initio-, homology-, and evidence-based annotation, 16,905 genes were annotated and functionally described. As an example of the utility of the genome, a simple ortholog cluster analysis was performed with sister species T. pretiosum, revealing over 6000 shared clusters and under 400 clusters unique to each species. The genome and transcriptome presented here provides an essential resource for comparative genomics of the commercially relevant genus Trichogramma, but also for research into molecular evolution, ecology, and breeding of T. brassicae.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 1750 ◽  
Author(s):  
Tapan Kumar Mondal ◽  
Hukam Chand Rawal ◽  
Kishor Gaikwad ◽  
Tilak Raj Sharma ◽  
Nagendra Kumar Singh

Oryza coarctata plants, collected from Sundarban delta of West Bengal, India, have been used in the present study to generate draft genome sequences, employing the hybrid genome assembly with Illumina reads and third generation Oxford Nanopore sequencing technology. We report for the first time that more than 85.71 % of the genome coverage and the data have been deposited in NCBI SRA, with BioProject ID PRJNA396417.


2017 ◽  
Vol 7 (1) ◽  
Author(s):  
Joe Parker ◽  
Andrew J. Helmstetter ◽  
Dion Devey ◽  
Tim Wilkinson ◽  
Alexander S. T. Papadopulos

Author(s):  
Masahiko Imashimizu ◽  
Yuji Tokunaga ◽  
Ariel Afek ◽  
Hiroki Takahashi ◽  
Nobuo Shimamoto ◽  
...  

In the process of transcription initiation by RNA polymerase, promoter DNA sequences affect multiple reaction pathways determining the productivity of transcription. However, the question of how the molecular mechanism of transcription initiation depends on sequence properties of promoter DNA remains poorly understood. Here, combining the statistical mechanical approach with high-throughput sequencing results, we characterize abortive transcription and pausing during transcription initiation by Escherichia coli RNA polymerase at a genome-wide level. Our results suggest that initially transcribed sequences enriched with thymine bases represent the signal inducing abortive transcription. On the other hand, certain repetitive sequence elements broadly embedded in promoter regions constitute the signal inducing pausing. Both signals decrease the productivity of transcription initiation. Based on solution NMR and in vitro transcription measurements, we also suggest that repetitive sequence elements of promoter DNA modulate the rigidity of its double-stranded form, which profoundly influences the reaction coordinates of the productive initiation via pausing.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e10029
Author(s):  
Inga Leena Angell ◽  
Morten Nilsen ◽  
Karin C. Lødrup Carlsen ◽  
Kai-Håkon Carlsen ◽  
Gunilla Hedlin ◽  
...  

Nanopore sequencing is rapidly becoming more popular for use in various microbiota-based applications. Major limitations of current approaches are that they do not enable de novo species identification and that they cannot be used to verify species assignments. This severely limits applicability of the nanopore sequencing technology in taxonomic applications. Here, we demonstrate the possibility of de novo species identification and verification using hexamer frequencies in combination with k-means clustering for nanopore sequencing data. The approach was tested on the human infant gut microbiota of 3-month-old infants. Using the hexamer k-means approach we identified two new low abundant species associated with vaginal delivery. In addition, we confirmed both the vaginal delivery association for two previously identified species and the overall high levels of bifidobacteria. Taxonomic assignments were further verified by mock community analyses. Therefore, we believe our de novo species identification approach will have widespread application in analyzing microbial communities in the future.


Sign in / Sign up

Export Citation Format

Share Document