scholarly journals Long-read sequence assembly: a technical evaluation in barley

2021 ◽  
Author(s):  
Martin Mascher ◽  
Thomas Wicker ◽  
Jerry Jenkins ◽  
Christopher Plott ◽  
Thomas Lux ◽  
...  

Abstract Sequence assembly of large and repeat-rich plant genomes has been challenging, requiring substantial computational resources and often several complementary sequence assembly and genome mapping approaches. The recent development of fast and accurate long-read sequencing by circular consensus sequencing (CCS) on the PacBio platform may greatly increase the scope of plant pan-genome projects. Here, we compare current long-read sequencing platforms regarding their ability to rapidly generate contiguous sequence assemblies in pan-genome studies of barley (Hordeum vulgare). Most long-read assemblies are clearly superior to the current barley reference sequence based on short-reads. Assemblies derived from accurate long reads excel in most metrics, but the CCS approach was the most cost-effective strategy for assembling tens of barley genomes. A downsampling analysis indicated that 20-fold CCS coverage can yield very good sequence assemblies, while even 5-fold CCS data may capture the complete sequence of most genes. We present an updated reference genome assembly for barley with near-complete representation of the repeat-rich intergenic space. Long-read assembly can underpin the construction of accurate and complete sequences of multiple genomes of a species to build pan-genome infrastructures in Triticeae crops and their wild relatives.

Author(s):  
Eric S Tvedte ◽  
Mark Gasser ◽  
Benjamin C Sparklin ◽  
Jane Michalski ◽  
Carl E Hjelmen ◽  
...  

Abstract The newest generation of DNA sequencing technology is highlighted by the ability to generate sequence reads hundreds of kilobases in length. Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) have pioneered competitive long read platforms, with more recent work focused on improving sequencing throughput and per-base accuracy. We used whole-genome sequencing data produced by three PacBio protocols (Sequel II CLR, Sequel II HiFi, RS II) and two ONT protocols (Rapid Sequencing and Ligation Sequencing) to compare assemblies of the bacteria Escherichia coli and the fruit fly Drosophila ananassae. In both organisms tested, Sequel II assemblies had the highest consensus accuracy, even after accounting for differences in sequencing throughput. ONT and PacBio CLR had the longest reads sequenced compared to PacBio RS II and HiFi, and genome contiguity was highest when assembling these datasets. ONT Rapid Sequencing libraries had the fewest chimeric reads in addition to superior quantification of E. coli plasmids versus ligation-based libraries. The quality of assemblies can be enhanced by adopting hybrid approaches using Illumina libraries for bacterial genome assembly or polishing eukaryotic genome assemblies, and an ONT-Illumina hybrid approach would be more cost-effective for many users. Genome-wide DNA methylation could be detected using both technologies, however ONT libraries enabled the identification of a broader range of known E. coli methyltransferase recognition motifs in addition to undocumented D. ananassae motifs. The ideal choice of long read technology may depend on several factors including the question or hypothesis under examination. No single technology outperformed others in all metrics examined.


Author(s):  
Mitchell J Sullivan ◽  
Nouri L Ben Zakour ◽  
Brian M Forde ◽  
Mitchell Stanton-Cook ◽  
Scott A Beatson

Contiguity is an interactive software for the visualization and manipulation of de novo genome assemblies. Contiguity creates and displays information on contig adjacency which is contextualized by the simultaneous display of a comparison between assembled contigs and reference sequence. Where scaffolders allow unambiguous connections between contigs to be resolved into a single scaffold, Contiguity allows the user to create all potential scaffolds in ambiguous regions of the genome. This enables the resolution of novel sequence or structural variants from the assembly. In addition, Contiguity provides a sequencing and assembly agnostic approach for the creation of contig adjacency graphs. To maximize the number of contig adjacencies determined, Contiguity combines information from read pair mappings, sequence overlap and De Bruijn graph exploration. We demonstrate how highly sensitive graphs can be achieved using this method. Contig adjacency graphs allow the user to visualize potential arrangements of contigs in unresolvable areas of the genome. By combining adjacency information with comparative genomics, Contiguity provides an intuitive approach for exploring and improving sequence assemblies. It is also useful in guiding manual closure of long read sequence assemblies. Contiguity is an open source application, implemented using Python and the Tkinter GUI package that can run on any Unix, OSX and Windows operating system. It has been designed and optimized for bacterial assemblies. Contiguity is available at http://mjsull.github.io/Contiguity .


2019 ◽  
Author(s):  
Eleanor Young ◽  
Heba Z. Abid ◽  
Pui-Yan Kwok ◽  
Harold Riethman ◽  
Ming Xiao

AbstractDetailed comprehensive knowledge of the structures of individual long-range telomere-terminal haplotypes are needed to understand their impact on telomere function, and to delineate the population structure and evolution of subtelomere regions. However, the abundance of large evolutionarily recent segmental duplications and high levels of large structural variations have complicated both the mapping and sequence characterization of human subtelomere regions. Here, we use high throughput optical mapping of large single DNA molecules in nanochannel arrays for 154 human genomes from 26 populations to present a comprehensive look at human subtelomere structure and variation. The results catalog many novel long-range subtelomere haplotypes and determine the frequencies and contexts of specific subtelomeric duplicons on each chromosome arm, helping to clarify the currently ambiguous nature of many specific subtelomere structures as represented in the current reference sequence (HG38). The organization and content of some duplicons in subtelomeres appear to show both chromosome arm and population-specific trends. Based upon these trends we estimate a timeline for the spread of these duplication blocks.Author SummaryThe ends of human chromosomes have caps called telomeres that are essential. These telomeres are influenced by the portions of DNA next to them, a region known as the subtelomere. We need to better understand the subtelomeric region to understand how it impacts the telomeres. This subtelomeric region is not well described in the current references. This is due to large variations in this region and portions that are repeated many times, making current sequencing technologies struggle to capture these regions. Many of these variations are evolutionary recent. Here we use 154 different samples from the 26 geographic regions of the world to gain a better understanding of the variation in these regions. We found many new haplotypes and clarified the haplotypes existing in the current reference. We then examined population and chromosome specific trends.


2015 ◽  
Author(s):  
Sara Goodwin ◽  
James Gurtowski ◽  
Scott Ethe-Sayers ◽  
Panchajanya Deshpande ◽  
Michael Schatz ◽  
...  

Monitoring the progress of DNA molecules through a membrane pore has been postulated as a method for sequencing DNA for several decades. Recently, a nanopore-based sequencing instrument, the Oxford Nanopore MinION, has become available that we used for sequencing the S. cerevisiae genome. To make use of these data, we developed a novel open-source hybrid error correction algorithm Nanocorr (https://github.com/jgurtowski/nanocorr) specifically for Oxford Nanopore reads, as existing packages were incapable of assembling the long read lengths (5-50kbp) at such high error rate (between ~5 and 40% error). With this new method we were able to perform a hybrid error correction of the nanopore reads using complementary MiSeq data and produce a de novo assembly that is highly contiguous and accurate: the contig N50 length is more than ten-times greater than an Illumina-only assembly (678kb versus 59.9kbp), and has greater than 99.88% consensus identity when compared to the reference. Furthermore, the assembly with the long nanopore reads presents a much more complete representation of the features of the genome and correctly assembles gene cassettes, rRNAs, transposable elements, and other genomic features that were almost entirely absent in the Illumina-only assembly.


Author(s):  
Cassandria Tay Fernandez ◽  
Jacob Marsh ◽  
Mônica Furaste Danilevicz ◽  
Clémentine Mercé ◽  
David Edwards

Abstract This chapter discusses the application of pangenomics for molecular breeding of wheat. Pangenomes can be used by both researchers and breeders alike to develop elite wheat cultivars through the discovery and integration of genetic variations associated with agronomically beneficial traits. By providing a reference that accommodates for variation in individuals, variants whose presence and/or absence control abiotic stress resistance and yield can be identified. This tool has only become more informative as more wheat varieties are sequenced, new sequencing approaches such as long-read sequencing and genome mapping are utilized, and tools for pangenomic analysis are developed. With pangenomics, variable genes from wild wheat relatives and related species can be used to optimize wheat molecular breeding and develop improved varieties tailored for the changing global environment.


Agronomy ◽  
2020 ◽  
Vol 10 (9) ◽  
pp. 1294 ◽  
Author(s):  
Rosa Mérida-García ◽  
Sergio Gálvez ◽  
Etienne Paux ◽  
Gabriel Dorado ◽  
Laura Pascual ◽  
...  

The practical use of molecular markers is facilitated by cost-effective detection techniques. In this work, wheat insertion site-based polymorphisms (ISBP) markers were set up for genotyping using high-resolution melting analysis (HRM). Polymorphic HRM-ISBP assays were developed for wheat chromosomes 4A and 3B and used for wheat variability assessment. The marker sequences were mapped against the wheat genome reference sequence, targeting interesting genes. Those genes were located within or in proximity to previously described quantitative trait loci (QTL) or meta-quantitative trait loci (MQTL) for drought and heat stress tolerance, and also yield and yield related traits. Eighteen of the markers used tagged drought related genes and, interestingly, eight of the genes were differentially expressed under different abiotic stress conditions. These results confirmed HRM as a cost-effective and efficient tool for wheat breeding programs.


BMC Biology ◽  
2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Peter D. Olson ◽  
Alan Tracey ◽  
Andrew Baillie ◽  
Katherine James ◽  
Stephen R. Doyle ◽  
...  

Abstract Background Chromosome-level assemblies are indispensable for accurate gene prediction, synteny assessment, and understanding higher-order genome architecture. Reference and draft genomes of key helminth species have been published, but little is yet known about the biology of their chromosomes. Here, we present the complete genome of the tapeworm Hymenolepis microstoma, providing a reference quality, end-to-end assembly that represents the first fully assembled genome of a spiralian/lophotrochozoan, revealing new insights into chromosome evolution. Results Long-read sequencing and optical mapping data were added to previous short-read data enabling complete re-assembly into six chromosomes, consistent with karyology. Small genome size (169 Mb) and lack of haploid variation (1 SNP/3.2 Mb) contributed to exceptionally high contiguity with only 85 gaps remaining in regions of low complexity sequence. Resolution of repeat regions reveals novel gene expansions, micro-exon genes, and spliced leader trans-splicing, and illuminates the landscape of transposable elements, explaining observed length differences in sister chromatids. Syntenic comparison with other parasitic flatworms shows conserved ancestral linkage groups indicating that the H. microstoma karyotype evolved through fusion events. Strikingly, the assembly reveals that the chromosomes terminate in centromeric arrays, indicating that these motifs play a role not only in segregation, but also in protecting the linear integrity and full lengths of chromosomes. Conclusions Despite strong conservation of canonical telomeres, our results show that they can be substituted by more complex, species-specific sequences, as represented by centromeres. The assembly provides a robust platform for investigations that require complete genome representation.


2020 ◽  
Vol 13 (1) ◽  
Author(s):  
Jenny G. Maloney ◽  
Aleksey Molokin ◽  
Monica Santin

Abstract Background Blastocystis sp. is one of the most common enteric parasites of humans and animals worldwide. It is well recognized that this ubiquitous protist displays a remarkable degree of genetic diversity in the SSU rRNA gene, which is currently the main gene used for defining Blastocystis subtypes. Yet, full-length reference sequences of this gene are available for only 16 subtypes of Blastocystis in part because of the technical difficulties associated with obtaining these sequences from complex samples. Methods We have developed a method using Oxford Nanopore MinION long-read sequencing and universal eukaryotic primers to produce full-length (> 1800 bp) SSU rRNA gene sequences for Blastocystis. Seven Blastocystis specimens representing five subtypes (ST1, ST4, ST10, ST11, and ST14) obtained both from cultures and feces were used for validation. Results We demonstrate that this method can be used to produce highly accurate full-length sequences from both cultured and fecal DNA isolates. Full-length sequences were successfully obtained from all five subtypes including ST11 for which no full-length reference sequence currently exists and for an isolate that contained mixed ST10/ST14. Conclusions The suitability of the use of MinION long-read sequencing technology to successfully generate full-length Blastocystis SSU rRNA gene sequences was demonstrated. The ability to produce full-length SSU rRNA gene sequences is key in understanding the role of genetic diversity in important aspects of Blastocystis biology such as transmission, host specificity, and pathogenicity.


2020 ◽  
Vol 8 (11) ◽  
pp. 1755
Author(s):  
Evert Drijver ◽  
Joep Stohr ◽  
Jaco Verweij ◽  
Carlo Verhulst ◽  
Francisca Velkers ◽  
...  

Distinguishing epidemiologically related and unrelated plasmids is essential to confirm plasmid transmission. We compared IncI1–pST12 plasmids from both human and livestock origin and explored the degree of sequence similarity between plasmids from Enterobacteriaceae with different epidemiological links. Short-read sequence data of Enterobacteriaceae cultured from humans and broilers were screened for the presence of both a blaCMY-2 gene and an IncI1–pST12 replicon. Isolates were long-read sequenced on a MinION sequencer (OxfordNanopore Technologies). After plasmid reconstruction using hybrid assembly, pairwise single nucleotide polymorphisms (SNPs) were determined. The plasmids were annotated, and a pan-genome was constructed to compare genes variably present between the different plasmids. Nine Escherichia coli sequences of broiler origin, four Escherichia coli sequences, and one Salmonella enterica sequence of human origin were selected for the current analysis. A circular contig with the IncI1–pST12 replicon and blaCMY-2 gene was extracted from the assembly graph of all fourteen isolates. Analysis of the IncI1–pST12 plasmids revealed a low number of SNP differences (range of 0–9 SNPs). The range of SNP differences overlapped in isolates with different epidemiological links. One-hundred and twelve from a total of 113 genes of the pan-genome were present in all plasmid constructs. Next generation sequencing analysis of blaCMY-2-containing IncI1–pST12 plasmids isolated from Enterobacteriaceae with different epidemiological links show a high degree of sequence similarity in terms of SNP differences and the number of shared genes. Therefore, statements on the horizontal transfer of these plasmids based on genetic identity should be made with caution.


2019 ◽  
Vol 6 (Supplement_2) ◽  
pp. S289-S289
Author(s):  
Shelby Simar ◽  
Blake Hanson ◽  
German Contreras ◽  
Katherine Reyes ◽  
Pranoti V Sahasrabhojane ◽  
...  

Abstract Background Vancomycin-resistant enterococci (VRE) are a major cause of nosocomial bloodstream infections. Enterococci exhibit remarkable genomic plasticity and can recombine through the acquisition of genetic material via mobile genetic elements (MGEs), including resistance genes. The accessory genome plays a major role in the evolution of enterococci within the human host. Thus, dissecting the entire genome (pan-genome) is of paramount importance to characterize the population structure of enterococci causing disease. Methods VENOUS is an ongoing prospective, observational study of adults with enterococcal bacteremia. From September 2016 to March 2018, E. faecalis (Efs) and E. faecium (Efm) were collected in 14 hospitals of a single hospital system and a major cancer center in Houston, TX, and a general hospital in Detroit, MI. Short- and long-read genomic sequencing were performed with Illumina MiSeq and Oxford Nanopore Technologies GridION X5, respectively. A proprietary bioinformatics pipeline was utilized for genome assembly and further analyses. Results 156 Efs and 98 Efm isolates from single patients were analyzed. The average proportion of core genes in each genome was 64.6% (53.0–74.1) and 49.1% (45.2–51.0) for Efs and Efm, respectively. The vanA gene cluster was identified in 5.1% (8/157) of Efs and 57.1% (56/98) of Efm. The plasmid-encoded aac(6′)-Ie-aph(2″)-Ia gene conferring high-level resistance to aminoglycosides was found in 37.6% (59/157) Efs, seven of which also possessed vanA. Long-read sequencing of vanA-harboring plasmids from a subset of VRE revealed that the vanA cluster was carried in plasmids ranging from 31.7 to 132.3 kb. Although the vanA operon was fairly conserved, insertions of MGE were identified in the intergenic regions of vanS/vanH and vanX/vanY. Furthermore, a variety of MGE insertions mediated integration of the vanA operon, including IS1216 and IS256 (figure). Conclusion Accessory genes, including AMR genes, comprise a significant proportion of the enterococcal pan-genome, indicating major genetic plasticity within these organisms. Acquired resistance genes seem to have a high degree of recombination and play a substantial role in the expansion of the genomic repertoire in clinical isolates. Disclosures Samuel L. Aitken, PharmD, Melinta Therapeutics: Grant/Research Support, Research Grant; Merck, Sharpe, and Dohme: Advisory Board; Shionogi: Advisory Board.


Sign in / Sign up

Export Citation Format

Share Document