Identification of Full-length Circular Nucleic Acids using Long-read Sequencing Technologies

ISOdb: A Comprehensive Database of Full-Length Isoforms Generated by Iso-Seq

International Journal of Genomics ◽

10.1155/2018/9207637 ◽

2018 ◽

Vol 2018 ◽

pp. 1-6 ◽

Cited By ~ 1

Author(s):

Shang-Qian Xie ◽

Yue Han ◽

Xiao-Zhou Chen ◽

Tai-Yu Cao ◽

Kai-Kai Ji ◽

...

Keyword(s):

Single Molecule ◽

Full Length ◽

Public Access ◽

Transcript Isoforms ◽

Sequencing Technologies ◽

Long Reads ◽

Depth Analysis ◽

Gene Level ◽

Long Read ◽

Full Length Transcript

The accurate landscape of transcript isoforms plays an important role in the understanding of gene function and gene regulation. However, building complete transcripts is very challenging for short reads generated using next-generation sequencing. Fortunately, isoform sequencing (Iso-Seq) using single-molecule sequencing technologies, such as PacBio SMRT, provides long reads spanning entire transcript isoforms which do not require assembly. Therefore, we have developed ISOdb, a comprehensive resource database for hosting and carrying out an in-depth analysis of Iso-Seq datasets and visualising the full-length transcript isoforms. The current version of ISOdb has collected 93 publicly available Iso-Seq samples from eight species and presents the samples in two levels: (1) sample level, including metainformation, long read distribution, isoform numbers, and alternative splicing (AS) events of each sample; (2) gene level, including the total isoforms, novel isoform number, novel AS number, and isoform visualisation of each gene. In addition, ISOdb provides a user interface in the website for uploading sample information to facilitate the collection and analysis of researchers’ datasets. Currently, ISOdb is the first repository that offers comprehensive resources and convenient public access for hosting, analysing, and visualising Iso-Seq data, which is freely available.

Download Full-text

Improving recovery of member genomes from enrichment reactor microbial communities using MinION–based long read metagenomics

10.1101/465328 ◽

2018 ◽

Cited By ~ 2

Author(s):

Krithika Arumugam ◽

Irina Bessarab ◽

Xianghui Liu ◽

Gayathri Natarajan ◽

Daniela I. Drautz–Moses ◽

...

Keyword(s):

Microbial Communities ◽

Genomic Dna ◽

Moderate Level ◽

Short Read ◽

Sequencing Technologies ◽

Complete Genomes ◽

Long Read

AbstractNew long read sequencing technologies offer huge potential for effective recovery of complete, closed genomes. While much progress has been made on cultured isolates, the ability of these methods to recover genomes of member taxa in complex microbial communities is less clear. Here we examine the ability of long read data to recover genomes from enrichment reactor metagenomes. Such modified communities offer a moderate level of complexity compared to the source communities and so are realistic, yet tractable, systems to use for this problem. We sampled an enrichment bioreactor designed to target anaerobic ammonium-oxidising bacteria (AnAOB) and sequenced genomic DNA using both short read (Illumina 301bp PE) and long read data (MinION Mk1B) from the same extraction aliquot. The community contained 23 members, of which 16 had genome bins defined from an assembly of the short read data. Two distinct AnAOB species from genus Candidatus Brocadia were present and had complete genomes, of which one was the most abundant member species in the community. We can recover a 4Mb genome, in 2 contigs, of long read assembled sequence that is unambiguously associated with the most abundant AnAOB member genome. We conclude that obtaining near closed, complete genomes of members of low-medium microbial communities using MinION long read sequence is feasible.

Download Full-text

metaFlye: scalable long-read metagenome assembly using repeat graphs

10.1101/637637 ◽

2019 ◽

Cited By ~ 9

Author(s):

Mikhail Kolmogorov ◽

Mikhail Rayko ◽

Jeffrey Yuan ◽

Evgeny Polevikov ◽

Pavel Pevzner

Keyword(s):

Dark Matter ◽

State Of The Art ◽

Full Length ◽

Bacterial Genomes ◽

Short Read ◽

Sequencing Technologies ◽

16S Rna ◽

Long Read ◽

Metagenome Assembly ◽

Rna Genes

AbstractLong-read sequencing technologies substantially improved assemblies of many isolate bacterial genomes as compared to fragmented assemblies produced with short-read technologies. However, assembling complex metagenomic datasets remains a challenge even for the state-of-the-art long-read assemblers. To address this gap, we present the metaFlye assembler and demonstrate that it generates highly contiguous and accurate metagenome assemblies. In contrast to short-read metagenomics assemblers that typically fail to reconstruct full-length 16S RNA genes, metaFlye captures many 16S RNA genes within long contigs, thus providing new opportunities for analyzing the microbial “dark matter of life”. We also demonstrate that long-read metagenome assemblers significantly improve full-length plasmid and virus reconstruction as compared to short-read assemblers and reveal many novel plasmids and viruses.

Download Full-text

Optimized Extraction of Insect Genomic DNA for Long-Read Sequencing

Methods and Protocols ◽

10.3390/mps2040089 ◽

2019 ◽

Vol 2 (4) ◽

pp. 89

Author(s):

Brenda Oppert ◽

Samantha Stoss ◽

Alaysha Monk ◽

Timothy Smith

Keyword(s):

Dna Extraction ◽

Genomic Dna ◽

Fragment Size ◽

Pupal Stage ◽

Red Flour Beetle ◽

High Quality ◽

Genomic Dna Extraction ◽

Sequencing Technologies ◽

Long Read ◽

Lysis Buffer

Long-read sequencing technologies continue to increase the length of reads, and at present can average read lengths of >20 kb up to 60–80 kb. Now the challenge is to extract genomic DNA of sufficient fragment size and quality to support longer read lengths. We developed a successful method to consistently obtain high-quality long genomic DNA from insects. The optimal developmental stage of insects for genomic DNA extraction was determined to be the pupal stage, eliminating DNA from ingested food and reducing contamination by chitinous material that can interfere with extraction. Improved results were obtained by a modified procedure of a commercial genomic DNA extraction kit. Initially, soft pupal tissue of the red flour beetle, Tribolium castaneum, was disrupted in the kit lysis buffer using Teflon micropestles. Modifications to the kit protocol also included gentle mixing by inversion of the tube, instead of harsh vortexing steps, and using wide-bore pipette tips in transferring fractions containing genomic DNA. Data from one sample were provided as an example of successful downstream library production and sequencing. While the technique has been optimized for insects, extractions from tissues of other organisms using these modified procedures also may improve long-read sequencing results.

Download Full-text

Nanopore ReCappable Sequencing maps SARS-CoV-2 5' capping sites and provides new insights into the structure of sgRNAs

10.1101/2021.11.24.469860 ◽

2021 ◽

Author(s):

Camilla Ugolini ◽

Logan Mulroney ◽

Adrien Leger ◽

Matteo Castelli ◽

Elena Criscuolo ◽

...

Keyword(s):

Viral Genome ◽

Full Length ◽

Accessory Proteins ◽

Genomic Rnas ◽

Robust Estimates ◽

Sequencing Technologies ◽

Junction Site ◽

Long Read ◽

A New Technique ◽

Viral Isolates

The SARS-CoV-2 virus has a complex transcriptome characterised by multiple, nested sub genomic RNAs used to express structural and accessory proteins. Long-read sequencing technologies such as nanopore direct RNA sequencing can recover full-length transcripts, greatly simplifying the assembly of structurally complex RNAs. However, these techniques do not detect the 5′ cap, thus preventing reliable identification and quantification of full-length, coding transcript models. Here we used Nanopore ReCappable Sequencing (NRCeq), a new technique that can identify capped full-length RNAs, to assemble a complete annotation of SARS-CoV-2 sgRNAs and annotate the location of capping sites across the viral genome. We obtained robust estimates of sgRNA expression across cell lines and viral isolates and identified novel canonical and non-canonical sgRNAs, including one that uses a previously un-annotated leader-to-body junction site. The data generated in this work constitute a useful resource for the scientific community and provide important insights into the mechanisms that regulate the transcription of SARS-CoV-2 sgRNAs.

Download Full-text

Taxonomic resolution of the ribosomal RNA operon in bacteria: Implications for its use with long read sequencing

10.1101/626093 ◽

2019 ◽

Author(s):

Leonardo de Oliveira Martins ◽

Andrew J. Page ◽

Ian G. Charles

Keyword(s):

Ribosomal Rna ◽

Sequence Variation ◽

Full Length ◽

Taxonomic Resolution ◽

Bacterial Cells ◽

Genus Level ◽

Sequencing Technologies ◽

Ribosomal Operon ◽

Long Read ◽

Multiple Copies

AbstractLong-read sequencing technologies enable capture of the full-length of ribosomal RNA operons in a single read. Bacterial cells usually have multiple copies of this ribosomal operon; sequence variation within a species of bacterium can exceed variation between species. For uncultured organisms this may affect the overall taxonomic resolution, to genus level, of the full-length ribosomal operon.

Download Full-text

Analysis of HLA-G long-read genomic sequences in mother–offspring pairs with preeclampsia

Scientific Reports ◽

10.1038/s41598-020-77081-3 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Ayako Nishizawa ◽

Kazuki Kumada ◽

Keiko Tateno ◽

Maiko Wagata ◽

Sakae Saito ◽

...

Keyword(s):

Single Molecule ◽

Gene Polymorphisms ◽

Genomic Dna ◽

Genomic Sequences ◽

Genomic Sequencing ◽

Public Database ◽

Coding Sequences ◽

Pacbio Rs Ii ◽

Potential Association ◽

Long Read

AbstractPreeclampsia is a pregnancy-induced disorder that is characterized by hypertension and is a leading cause of perinatal and maternal–fetal morbidity and mortality. HLA-G is thought to play important roles in maternal–fetal immune tolerance, and the associations between HLA-G gene polymorphisms and the onset of pregnancy-related diseases have been explored extensively. Because contiguous genomic sequencing is difficult, the association between the HLA-G genotype and preeclampsia onset is controversial. In this study, genomic sequences of the HLA-G region (5.2 kb) from 31 pairs of mother–offspring genomic DNA samples (18 pairs from normal pregnancies/births and 13 from preeclampsia births) were obtained by single-molecule real-time sequencing using the PacBio RS II platform. The HLA-G alleles identified in our cohort matched seven known HLA-G alleles, but we also identified two new HLA-G alleles at the fourth-field resolution and compared them with nucleotide sequences from a public database that consisted of coding sequences that cover the 3.1-kb HLA-G gene span. Intriguingly, a potential association between preeclampsia onset and the poly T stretch within the downstream region of the HLA-G*01:01:01:01 allele was found. Our study suggests that long-read sequencing of HLA-G will provide clues for characterizing HLA-G variants that are involved in the pathophysiology of preeclampsia.

Download Full-text

Ultra-accurate microbial amplicon sequencing with synthetic long reads

Microbiome ◽

10.1186/s40168-021-01072-3 ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Benjamin J. Callahan ◽

Dmitry Grinevich ◽

Siddhartha Thakur ◽

Michael A. Balamotis ◽

Tuval Ben Yehezkel

Keyword(s):

Microbial Community ◽

16S Rrna ◽

Amplicon Sequencing ◽

Species Level ◽

Full Length ◽

16S Rrna Genes ◽

Rrna Genes ◽

Strain Identification ◽

Long Reads ◽

Long Read

Abstract Background Out of the many pathogenic bacterial species that are known, only a fraction are readily identifiable directly from a complex microbial community using standard next generation DNA sequencing. Long-read sequencing offers the potential to identify a wider range of species and to differentiate between strains within a species, but attaining sufficient accuracy in complex metagenomes remains a challenge. Methods Here, we describe and analytically validate LoopSeq, a commercially available synthetic long-read (SLR) sequencing technology that generates highly accurate long reads from standard short reads. Results LoopSeq reads are sufficiently long and accurate to identify microbial genes and species directly from complex samples. LoopSeq perfectly recovered the full diversity of 16S rRNA genes from known strains in a synthetic microbial community. Full-length LoopSeq reads had a per-base error rate of 0.005%, which exceeds the accuracy reported for other long-read sequencing technologies. 18S-ITS and genomic sequencing of fungal and bacterial isolates confirmed that LoopSeq sequencing maintains that accuracy for reads up to 6 kb in length. LoopSeq full-length 16S rRNA reads could accurately classify organisms down to the species level in rinsate from retail meat samples, and could differentiate strains within species identified by the CDC as potential foodborne pathogens. Conclusions The order-of-magnitude improvement in length and accuracy over standard Illumina amplicon sequencing achieved with LoopSeq enables accurate species-level and strain identification from complex- to low-biomass microbiome samples. The ability to generate accurate and long microbiome sequencing reads using standard short read sequencers will accelerate the building of quality microbial sequence databases and removes a significant hurdle on the path to precision microbial genomics.

Download Full-text

FlsnRNA-seq: protoplasting-free full-length single-nucleus RNA profiling in plants

Genome Biology ◽

10.1186/s13059-021-02288-0 ◽

2021 ◽

Vol 22 (1) ◽

Cited By ~ 2

Author(s):

Yanping Long ◽

Zhijian Liu ◽

Jinbu Jia ◽

Weipeng Mo ◽

Liang Fang ◽

...

Keyword(s):

Single Cell ◽

Cell Walls ◽

Large Scale ◽

Full Length ◽

Cell Level ◽

Root Cells ◽

Rna Profiling ◽

Different Types ◽

Long Read ◽

Single Nucleus

AbstractThe broad application of single-cell RNA profiling in plants has been hindered by the prerequisite of protoplasting that requires digesting the cell walls from different types of plant tissues. Here, we present a protoplasting-free approach, flsnRNA-seq, for large-scale full-length RNA profiling at a single-nucleus level in plants using isolated nuclei. Combined with 10x Genomics and Nanopore long-read sequencing, we validate the robustness of this approach in Arabidopsis root cells and the developing endosperm. Sequencing results demonstrate that it allows for uncovering alternative splicing and polyadenylation-related RNA isoform information at the single-cell level, which facilitates characterizing cell identities.

Download Full-text

Biosynthetic potential of uncultured Antarctic soil bacteria revealed through long-read metagenomic sequencing

The ISME Journal ◽

10.1038/s41396-021-01052-3 ◽

2021 ◽

Author(s):

Valentin Waschulin ◽

Chiara Borsetto ◽

Robert James ◽

Kevin K. Newsham ◽

Stefano Donadio ◽

...

Keyword(s):

Genome Mining ◽

Gene Clusters ◽

Biosynthetic Gene Cluster ◽

Full Length ◽

Metagenomic Sequencing ◽

Short Read ◽

Short Read Sequencing ◽

Rich Diversity ◽

Long Read ◽

The Rich

AbstractThe growing problem of antibiotic resistance has led to the exploration of uncultured bacteria as potential sources of new antimicrobials. PCR amplicon analyses and short-read sequencing studies of samples from different environments have reported evidence of high biosynthetic gene cluster (BGC) diversity in metagenomes, indicating their potential for producing novel and useful compounds. However, recovering full-length BGC sequences from uncultivated bacteria remains a challenge due to the technological restraints of short-read sequencing, thus making assessment of BGC diversity difficult. Here, long-read sequencing and genome mining were used to recover >1400 mostly full-length BGCs that demonstrate the rich diversity of BGCs from uncultivated lineages present in soil from Mars Oasis, Antarctica. A large number of highly divergent BGCs were not only found in the phyla Acidobacteriota, Verrucomicrobiota and Gemmatimonadota but also in the actinobacterial classes Acidimicrobiia and Thermoleophilia and the gammaproteobacterial order UBA7966. The latter furthermore contained a potential novel family of RiPPs. Our findings underline the biosynthetic potential of underexplored phyla as well as unexplored lineages within seemingly well-studied producer phyla. They also showcase long-read metagenomic sequencing as a promising way to access the untapped genetic reservoir of specialised metabolite gene clusters of the uncultured majority of microbes.

Download Full-text